Multifactorial evolutionary algorithm with adaptive transfer strategy based on decision tree

Multifactorial optimization (MFO) is a kind of optimization problem that has attracted considerable attention in recent years. The multifactorial evolutionary algorithm utilizes the implicit genetic transfer mechanism characterized by knowledge transfer to conduct evolutionary multitasking simultaneously. Therefore, the effectiveness of knowledge transfer significantly affects the performance of the algorithm. To achieve positive knowledge transfer, this paper proposed an evolutionary multitasking optimization algorithm with adaptive transfer strategy based on the decision tree (EMT-ADT). To evaluate the useful knowledge contained in the transferred individuals, this paper defines an evaluation indicator to quantify the transfer ability of each individual. Furthermore, a decision tree is constructed to predict the transfer ability of transferred individuals. Based on the prediction results, promising positive-transferred individuals are selected to transfer knowledge, which can effectively improve the performance of the algorithm. Finally, CEC2017 MFO benchmark problems, WCCI20-MTSO and WCCI20-MaTSO benchmark problems are used to verify the performance of the proposed algorithm EMT-ADT. Experimental results demonstrate the competiveness of EMT-ADT compared with some state-of-the-art algorithms.


Introduction
In recent years, many researchers show great interest on a new category of optimization problems, which is called multifactorial optimization (MFO).Different from well-known optimization problems, such as single objective optimization and multiobjective optimization, MFO aims to handle multiple distinct optimization tasks in one run.Such problems widely exist in the fields of science, engineering and technology.For example, in a complex supply chain problem [1], two optimization problems, i.e. shop scheduling (production optimization) and vehicle routing (logistics optimization), are involved simultaneously.Gupta et al. [1] was the first to use evolutionary algorithm to deal with single objective MOP problems, which is called multifactorial evolutionary B Wei Li liwei@xaut.edu.cn 1 School of Computer Science and Engineering, Xi'an University of Technology, Xi'an 710048, China 2 Shaanxi Key Laboratory for Network Computing and Security Technology, Xi'an 710048, China algorithm (MFEA).Subsequently, a series of multifactorial evolutionary algorithms are proposed and applied to deal with different optimization problems, such as job shop scheduling [2], shortest-path tree [3], ensemble classification [4] and multiobjective optimization [5].
MFEA is characterized by integrating cultural effects through assortative mating and vertical cultural transmission.In MFEA, a prescribed parameter called random mating probability (rmp) is used to control knowledge transfer during the optimization process.Due to lack of prior knowledge about intertask similarity, for related multiple optimization tasks, rmp can enhance the optimization performance, which is called positive transfer; on the contrary, the optimization performance may deteriorate for unrelated multiple optimization tasks, which is called negative transfer.Since knowledge transfer is very important to multifactorial evolutionary algorithms, researchers have developed many transfer strategies to alleviate negative knowledge transfer between unrelated tasks.These strategies can be divided into four categories: 1. Domain adaptation techniques: Bali et al. [6] proposed a linearized domain adaptation (LDA) strategy, which transforms the search space to improve the correlation between representative space and its constitutive task.Accordingly, an effective platform is provided for knowledge transfer.Wang et al. [7] developed an explicit autoencoding strategy to incorporate multiple search mechanisms with different biases in the evolutionary multitasking paradigm.The proposed autoencoder aims to learn a mapping between problem domains.Xue et al. [8] proposed an affine transformation-enhanced MFO algorithm (AT-MFEA) to enhance the transferability between distinct tasks.In AT-MFEA, a superior intertask mapping is obtained by the rank loss function.Moreover, the evolutionary-path-based representation model is established to bridge the gap between two distinct problems from different domains.2. Adaptive strategy of parameter rmp: Ong et al. [9] proposed an online transfer parameter estimation strategy (MFEA-II) to minimize damage caused by negative transfer between tasks.In MFEA-II, the parameter rmp takes the form of a symmetric matrix instead of a scalar value.The constructed matrix can capture the nonuniform inter-task synergies even if the complementarity between tasks may not be uniform across different taskpairs.Moreover, the RMP matrix is continuously learned and adapted during the search process, which is helpful to obtain the global optimums of different tasks.Xu et al. [5] proposed a cultural transmission based multi-objective evolution strategy (CT-EMT-MOES), where an adaptive information transfer strategy is developed to adaptively adjust the parameter rmp.In detail, the proposed transfer strategy utilizes the mutation success rate of the target itself and the success rate of the information transfer to reasonably allocate the evolution resources between tasks.Li et al. [10] proposed an explicit multipopulation evolutionary framework (MPEF) to improve information transfer effects.In MPEF, the parameter rmp is adjusted based on the evolution status of the population.Specifically, if the ratio that offspring is superior to its parent is larger than the given threshold, the rmp will not be adjusted because the population evolves well with the current rmp.On the contrary, the rmp will be updated because the current rmp causes a negative transfer.3. Intertask learning strategy: Da et al. [11] proposed a transfer evolutionary computation paradigm (AMTEA), which can reduce the risks of negative transfer via online source-target similarity learning.In AMTEA, a probabilistic model is constructed with the distribution of elite solutions from some source optimization task.Subsequently, the probabilistic model is used to provide a promising direction for the search on a related target task.Gao et al. [12] utilized semi-supervised learning strategy to enhance the effectiveness of knowledge transfer (EMT-SSC).In EMT-SSC, the promising individuals are identified with semi-supervised learning strategy.Then, these individuals transfer valuable knowledge between tasks.Zheng et al. [13] developed a self-regulated evolutionary multitask optimization (SREMTO) algorithm to dynamically adjust the intensity of knowledge transfer between tasks.In SREMTO, a task group is created based on the ability vectors of individuals.The degree of overlap between the task groups depends on the degree of tasks relatedness.The cross-task knowledge transfer is conducted through the overlapping parts between task groups.4. Multi-knowledge transfer mechanism: Cai et al. [14] proposed a hybrid knowledge transfer strategy to conduct information transfer between tasks (EMTO-HKT).
In EMTO-HKT, a multi-knowledge transfer mechanism including an individual-level learning strategy and a population-level learning strategy are used to transfer knowledge according to the degree of the task relatedness.Liang et al. [15] proposed a two-stage adaptive knowledge transfer mechanism.At the first stage, the search step of each individual is adjusted to alleviate the negative transfer, while at the second stage, the search range of each individual is adjusted to improve the exploration ability of the population.Ding et al. [16] proposed a generalized multitasking evolutionary optimization for expensive problems (G-MFEA).In G-MFEA, two strategies are proposed to conduct knowledge transfer between optimization problems with different locations of the optimums and different numbers of decision variables.
Although existing MFEAs endeavor to alleviate negative transfer during the optimization process, the solution precision obtained by these algorithms is not satisfactory, especially for those multitasking problems with low relatedness.Further, individuals with useless knowledge for other tasks are often transferred due to the lack of prior information about the relatedness between tasks, which obviously results in a waste of resources.To solve these problems, this paper presents an evolutionary multitasking optimization algorithm with adaptive transfer strategy based on the decision tree (EMT-ADT).In EMT-ADT, the decision tree based on Gini coefficient is constructed to predict the individual transfer ability.To the best of our knowledge, this is the first attempt in the literature to use the decision tree to enhance positive knowledge transfer in the MFO paradigm.The primary contributions of this paper can be summarized as follows: 1.The transfer ability of individuals is defined to quantify the amount of useful knowledge contained in the transferred individuals.Individuals with high transfer ability are used to construct a decision tree.
2. Combine with a knowledge of supervised machine learning, the proposed algorithm uses decision tree to predict the positive-transferred individuals.By selecting promising positive-transferred individuals, the proposed algorithm can improve the probability of a positive transfer.3. The success-history based adaptive differential evolution algorithm (SHADE) is used as the search engine, which can demonstrate the generality of the MFO paradigm.Three multifactorial optimization benchmark sets are used to verify the competitiveness of the proposed method.
The rest of this paper is organized as follows: the next section introduces the details of MFO and decision tree model.The next section describes the proposed EMT-ADT algorithm.The following section presents the experimental results on three multifactorial optimization (MFO) benchmark sets and two combinatorial optimization problems (TSP and TRP).The conclusion and future work are summarized in the last section.

Multifactorial optimization
As mentioned before, multifactorial optimization (MFO) is an evolutionary multitasking paradigm that aims to find a group of optimal solutions simultaneously, each of which corresponds to an optimization problem.To compare individuals in a multitasking environment conveniently, it is necessary to encode and decode different individuals in a unified search space.For an unconstrained multitasking problem with n tasks, T j denotes the jth task with a search space X j and an objective function f j .For the ith individual p i , its properties are defined as follows [1]: Definition 1 The factorial cost i j of individual p i on task T j is the objective value f i j of individual p i .

Definition 2
The factorial rank r i j of individual p i on task T j is the index of p i , provided that the population is sorted in ascending order according to j .

Definition 3
The scalar fitness of individual p i is defined as ' i 1/ min j {1, ..., n} {r i j }.

Definition 4
The skill factor τ i of individual p i is defined as τ i argmin j {1, ..., n} {r i j }.In other words, τ i denotes the index of the task that individual p i performs the best among all other tasks.
MFEA is a pioneer evolutionary algorithm to realize the MFO paradigm, which transfers genes and memes through assortative mating and vertical cultural operation.Algorithm 1 presents the basic framework of MFEA.

Decision tree
The decision tree is one of the supervised machine learning method to multistage decision making [17].A tree structure is used to present the decision rules summarized from a series of data with characteristics and labels.Generally, a decision tree consists of a root node, some internal nodes and leaf nodes.Leaf nodes are nodes that have no appropriate descendants, while other nodes (except the root) are called internal nodes.Each internal node is associated with a test attribute, each branch represents the outcome of the test, while each leaf node assigns one or more class labels [18,19].A path from the root node to each leaf node corresponds to a decision test sequence.
The design of a decision tree mainly includes three tasks [17].
Task 1: Select a node splitting rule.Taking the playing dataset as an example, Table 1 shows an instance of playing training dataset, which has fourteen samples.Each sample consists of ID, four attributes and one category label.The attribute consists of weather (a 1 ), temperature (a 2 ), humidity (a 3 ) and whether there is wind (a 4 ).The category label (L) is whether to play today.
Figure 1 shows the decision tree trained by Table 1.n i represents the ith node.

Motivations
Positive knowledge transfer has a significant effect on the performance of multifactorial evolutionary algorithm.In MFEA, the knowledge transfer among tasks is controlled by the parameter rmp.In each task, since the solutions are randomly selected to exchange information based on the same probability, there is a possibility that the transfer turns out to be negative, thereby leading to deterioration of algorithm performance [20,21].Therefore, in the case of uncertain correlation between tasks, how to accurately select valuable solutions is core to improve the performance of the algorithm.To solve these problems, this paper proposed an evaluation rule to quantify the transfer ability of individuals involved in knowledge transfer.
Furthermore, the research shows that the decision tree model has the following advantages in the construction process and data processing.First of all, the decision tree uses the knowledge learned from training to directly form a hierarchical structure, which is readable and easy to understand and implement.Secondly, the decision tree is suitable for data sets with small size, and the time complexity of the decision tree algorithm is small.Finally, the decision tree is not sensitive to missing values during data processing.In addition, the decision tree can deal with irrelevant feature data, and can accurately predict the results of analysis with large data sources in a relatively short time.Therefore, in the proposed EMT-ADT algorithm, the decision tree is constructed according to the information of transferred individuals.The number of individuals for each knowledge transfer is set to 10, and five consecutive generations of transferred individuals are used as the training data to construct the decision tree model, which conforms to the characteristics of the decision tree.Based on the constructed decision tree, promising positivetransferred individuals are selected to conduct knowledge transfer between tasks, which can achieve a fast convergence and improve the solution accuracy.

Definition of transfer ability
For a population P {x 1 , x 2 , . . ., x N }, an archive A {t 1 , t 2 , . . ., t N } is used to store individuals involved in knowledge transfer; the historical transferred population TP {t i |1 ≤ i ≤ n ∧ ω(t i ) > ω(t n+1 )} is used to store n individuals with the highest knowledge transfer ability in the archive A. Each t i has an associated subset i .
For each t i ∈ T P, if t i participates in the generation of offspring y j (recorded as t i → y j ), the associated subset i is defined as follows.
Accordingly, the transfer amount λ i, j of the jth individual in i is defined as follows: Fig. 2 An instance of the calculation of the transfer ability where x j is the parent of the offspring y j .
The transfer ability ω(t i ) of transferred individual t i is defined as follows, where u represents the size of subset i .
Figure 2 shows an example of the calculation of the transfer ability, where all individuals are in a 2-D decision space.
In Fig. 2, the historical transferred population TP {t 1 ,t 2 ,t 3 ,t 4 ,t 5 } is composed of five transferred individuals.The performance of the offspring y 1 and y 4 is better than that of the parent, while the performance of the offspring y 2 and y 3 is worse than that of the parent.1.According to Eq. ( 3), the transfer ability of t 2 is ω(t 2 ) λ 2, 1 + λ 2, 2 1. t 3 participates in the generation of the offspring y 1 and y 4 .The associated subset of t 3 is recorded as 3 {y 1 , y 4 }.Since the offspring y 1 and y 4 outperform their parents, then λ 3, 1 1, λ 3, 2 1.The transfer ability of t 2 is ω(t 3 ) λ 3, 1 + λ 3, 2 2. ω(t 4 ) and ω(t 5 ) are calculated in the same way.Table 2 shows the result of transfer ability of the historical transferred population TP.

Construction of decision tree
In the proposed EMT-ADT algorithm, the archive A is used to store individuals involved in knowledge transfer.At each generation, n individuals selected from the auxiliary population are used to update the archive.For a multitask problem with two tasks, when optimizing task T 1 , the population associated with task T 2 is regarded as an auxiliary population and vice versa.Since it is possible to select individuals with negative transfer by using random selection strategy, a decision tree model classifier is constructed to predict the transfer ability of each individual in the auxiliary population.Subsequently, transferred individuals are sorted in descending order according to transfer ability.The top n transferred individuals with high transfer ability are chosen to be placed into the archive A. Furthermore, the decision tree model classifier is used to predict the transfer ability of individuals in the archive A. The top n transferred individuals with high transfer ability are selected as the historical transferred population TP to conduct knowledge transfer for the next generation.At each generation, the historical transferred population TP with LP generations is used as the training data, that is TP_DT G−1 g G−L P T P g .The process of the decision tree model classifier construction is described as follows.
1.According to individual transfer ability, the individual with the highest transfer ability in TP_DT is recorded as t best .
where m is the size of TP_DT.2. For ∀t i ∈ T P_DT , it has two feature attributes, Euclidean distance attribute (a 1 ) and factorial cost attribute (a 2 ).The Euclidean distance dis(t i , t best ) between t i and t best , which is recorded as the data x i,1 of a 1 , is put into the set 1 .The factorial cost f (t i ) of individual t i , which is recorded as the data x i,2 of a 2 , is put into the set 2 .The transfer ability ω(t i ) of individual t i , which is recorded as the data y i of label attribute, is put into the set 3 .Let x i x i, 1 ∪ x i, 2 , the candidate attribute set A {a 1 ,a 2 }, the dataset D {(x 1 , y 1 ), (x 2 , y 2 ), . . ., (x m , y m )}.
3. Calculate the Gini index of each feature attribute with the sets 1 , 2 and 3 , where p k represents the proportion of the current category k.
4. Calculate the Gini index of the jth splitting value b j corresponding to the feature attribute a i , where v represents the category for the label attribute in dataset D.
5. The splitting value b * with the lowest Gini index is selected as the optimal splitting attribute.In the candidate attribute set A, the feature attribute a * corresponding to b * is selected as the current node.
Algorithm 2 presents the construction of decision tree.
Take Table 3 as an example to construct a decision tree.Euclidean distance (denoted as dis) represents the first feature attribute, while factorial cost (denoted as f ) represents the second feature attribute.Firstly, determine the root node.Since the attribute dis is a numerical attribute, sort the data in ascending order.Then, the samples are split into two groups with the middle value of adjacent values from small to large.Significantly, two adjacent values are different.For example, if dis 0.2 and dis 0.4, the median value is 0.3.Then, the median value is used as the split point, the calculated Gini index is 0.619.Similarly, other median values and Gini indexes can also be calculated in the same way, as shown in Table 4.In Table 4, the Gini index obtained by taking 0.95 as the split point is the smallest, which is 0.32.Table 5 shows the Gini index of different split points with the factorial cost as node.In Table 5, the Gini index obtained by taking 17.5 as the split point is the smallest, which is 0.441.As seen from Tables 4 and 5, since the Gini index 0.32 in Table 4 is less than the Gini index 0.441 in Table 5, the Euclidean distance is used as the split attribute of the root node, and 0.95 is used as the split value to construct the decision tree.After calculation, the factorial cost is used as the split attribute of intermediate node on the second level, and 10.5 is used as the split value to construct the decision tree.Finally, the constructed decision tree is shown in Fig. 3.
In Fig. 3, the leaf node represents the transfer ability.x 1 represents the first feature attribute: Euclidean distance, while x 2 represents the second feature attribute: factorial cost.Take the transfer ability prediction of data (0.4, 6) as an example, firstly, we judge at the root node.Since 0.4 is less than 0.95, we turn to the left subtree to judge.At the intermediate node, 6 is less than 10.5, the data transfer ability is predicted to be 2.For further explain the prediction of individual transfer ability, a two-task benchmark problem including Griewank problem and Rastrigin problem is selected.Both problems are characterized by multimodal and nonseparable.The detailed properties are shown as follows.
(1) Rastrgin: (2) Griewank: Figure 4 shows the decision tree model constructed by the proposed algorithm EMT-ADT at generation g on the two-task benchmark problem (Eqs.( 9), ( 10)).At this time, there are six cases of transfer ability (2,3,4,5,6,7).Take the transfer ability prediction of data (2.794, 18,694) as an example, firstly, we judge at the root node.Since 2.794 is less than 2.96472, we turn to the left subtree to judge.Similarly, 2.794 is less than 2.95378, we continue to judge on the root node of the left subtree of this node.Since 2.794 is greater than 1.08146, we judge on the root node of the right subtree of this node.Similarly, 2.794 is greater than 2.75214, we continue to judge on the root node of the right subtree of this node.Finally, 18,694 is less than 18,869.9, the transfer ability prediction result of this data is 4.

Fig. 3 Decision tree model construction
The transferability of an individual is defined as follows.
If an auxiliary parent participates in the generation of an offspring for m times, and there are n offspring that are superior to the corresponding parent, then the transfer ability of the auxiliary parent is n.As seen in Fig. 4, take the leaf node marked in red circle as an example.The node in red circle indicates that an individual reaches the position of the current leaf node through decision tree classification, its corresponding transferability is 7.

Search strategy
Many classical optimization algorithms, such as GA (genetic algorithm), DE (differential evolution), PSO (particle swarm optimization), TLBO (teaching-learning-based optimization), and BSO (brain storm algorithm) can be used as search engine in MFO paradigm [22][23][24][25].Different algorithms have different search performance.Obviously, well designed search strategies can improve the search efficiency.The success-history based adaptive differential evolution algorithm (SHADE) proposed by Tanabe et al. [26] has been proved to be an effective optimization algorithm.In SHADE, a historical memory is used to store the control parameters that performed well during the evolution.New control parameters are generated by sampling the parameters in the historical memory, which may further improve the performance of the algorithm.Since SHADE outperformed many state-of-the-art DE algorithms on CEC2013 benchmark set and CEC2005 benchmarks, this paper selects SHADE as the search engine.The mutation operator is defined as follows [26].
where x i is the ith individual of the current population.x pbest is randomly selected from the top p% individuals in the current population, while x r1 is randomly selected from the current population.x r 2 is randomly selected from the union of the current population and the archive.The details of SHADE can be found in [26].
To improve the knowledge transfer between different tasks on MFO problems, the original mutation operator of SHADE is modified and is defined as follows.
For a multitask problem with two tasks, P is the subpopulation corresponding to the target task, and P is the subpopulation corresponding to the auxiliary task.TP is the historical transferred population, which is used to provide the transferred individuals for the target task.x i is the ith individual of the population P. x pbest is randomly selected from the top p% individuals in the population P .x r 1 and x r 2 are randomly selected from TP. F is the scale factor.After mutation operator, the same crossover operator as in SHADE is used to generate the final offspring.
In the modified SHADE mutation operator, x pbest can provide the promising direction for the population, which can promote the convergence.The randomly constructed difference vector (x r 1 − x r 2 ) not only enhances the population diversity, but also promote positive knowledge transfer due to the selection of the transferred individuals with highest transfer ability.

The proposed EMT-ADT
The pseudocode of the EMT-ADT algorithm is shown in Algorithm 3. P is the subpopulation corresponding to the auxiliary task.The archive A is used to store individuals involved in knowledge transfer.x best is the best individual of P .NP i is the population size of ith task.FES and MAXFES represent the current number of function evaluations and the maximal number of function evaluations, respectively.
The main steps of EMT-ADT is as follows.First, for each task T i , the offspring is generated by Eq. ( 11) or (12) according to the random mating probability rmp.Meanwhile, the associated subset i , the transfer amount λ i, j and the transfer ability ω(t i ) are calculated according to Eqs. ( 1 When the success rate sr i (sr i denotes the rate that offspring is better than its parent) is greater than the given threshold, the random mating probability rmp i for each task is updated as follows [10].
where tsr i denotes the rate that offspring generated with knowledge transfer is better than its parent.If all offspring are generated without knowledge transfer, then TNP is set to 0; otherwise, TNP is the number of offspring generated with knowledge transfer.

Complexity analysis
The computational cost of the EMT-ADT mainly comes from assortative mating, the adaptive knowledge transfer based on decision tree and historical transferred population update.In EMT-ADT, a loop over NP (population size) is conducted, containing a loop over D (dimension) and m optimization tasks.Assortative mating is performed according to the random mating probability (rmp).Then, the runtime complexity

Comparative studies of experiments
To evaluate the competitiveness of the proposed EMT-ADT algorithm, 9 benchmark test problems from the CEC2017 Evolutionary Multi-Task Optimization Competition [27] are employed.Each test problem is a two-task problem.According to the similarity between the landscapes of two tasks, the benchmark problems can be categorized into three groups: high similarity (HS), medium similarity (MS) and low similarity (LS).Furthermore, the benchmark problems can be divided into three groups according to the intersection degree of the global optima: complete intersection (CI), partial intersection (PI) and no intersection (NI).The details of these benchmark problems can be found in [27].In addition, two complex single-objective MFO test suites, i.e.WCCI20-MTSO and WCCI20-MaTSO, are selected to further verity the competitiveness of the proposed EMT-ADT.Both WCCI20-MTSO and WCCI20-MaTSO include 10 test problems, which are put forward for WCCI 2020 Competition on Evolutionary Multitasking Optimization [28].Each test problem in WCCI20-MTSO is with 2 tasks, while each test problem in WCCI20-MaTSO is with 10 tasks.

Parameter settings
The aforementioned CEC2017 multitask problems are given in Table 6.

Experiments on CEC2017 multitask problems
In this section, the proposed EMT-ADT algorithm is compared with the above-listed algorithms to verify the performance.Tables 8, 9, 10 summarize the mean fitness (mean) and standard deviation (Std) achieved by MFEA, MFEARR, MFDE, AT-MFEA, SREMTO, MFMP, TLTLA, MTEA-AD and EMT-ADT over 30 runs.The codes are conducted with MAXFES as the termination criterion, which is set to 200,000.Three statistical test measures including the Wilcoxon signed-rank test [32], the multiple-problem Wilcoxon's test [33] and Friedman's test [33] 5.As can be seen in Fig. 5, EMT-ADT algorithm converges faster than the competitive algorithms.

Comparisons on partial intersection problems
As can be observed in Table 9, the proposed EMT-ADT algorithm outperforms the MFEA, MFEARR, AT-MFEA, MTEA-AD, MFDE, TLTLA, SREMTO and MFMP in 4 out of 6 tasks.TLTLA performs best on task T 1 of PI + HS problem.For the task T 2 of PI + LS problem, the performance of EMT-ADT is similar to that of MFMP.All in all, the results show that the proposed EMT-ADT outperforms the MFEA, MFEARR, AT-MFEA, MTEA-AD, MFDE, TLTLA, SREMTO and MFMP on 6, 6, 6, 6, 5, 5, 6, 5 optimization tasks, respectively.Figure 6 shows the convergence curves of the mean fitness values obtained by each algorithm on partial intersection problems.As can be seen in Fig. 6, compared with the eight competitive algorithms, EMT-ADT algorithm converges faster on all tasks except task T 1 of PI + HS problem.

Adaptive knowledge transfer analysis
To evaluate the effectiveness of the proposed adaptive knowledge transfer, EMT-ADT is compared with the algorithm without adaptive knowledge transfer (EMT) on 3 representative CEC2017 multitask problems including CI + MS, PI + MS and NI + MS.As explained before, if the offspring generated with adaptive knowledge transfer has better performance than its parent, it is referred to as positive transfer.
Figure 8 shows the convergence curves of the mean fitness values obtained by EMT-ADT and EMT on test problems CI + MS, PI + MS and NI + MS. Figure 9 shows the mean number of the transferred individuals during evolution in EMT-ADT on test problems CI + MS, PI + MS and NI + MS.In Fig. 9, red circles and blue stars denote the number of transferred individuals in the process of evolution on task T 1 and task T 2 , respectively.As seen in Figs. 8 and 9, EMT-ADT outperforms EMT in terms of the solution quality, which demonstrates that adaptive knowledge transfer strategy can improve the performance of the algorithm.

Comparisons on WCCI20-MSTO
To investigate the performance on more complex multitasking problems, the proposed EMT-ADT is further compared with eight state-of-the-art multitasking optimization algorithms on WCCI20-MSTO benchmark suite.Tables 11 and  12 show the comparative results of nine algorithms in terms of mean fitness.
As can be seen from Tables 11 and 12, EMT-ADT obtained 17 best values out of 20 WCCI20-MTSO test instances.MFEA, MFEARR and MTEA-AD perform well on problem P 8 .SREMTO achieves the best value on the Task T 1 of problem P 9 .The last rows of Tables 11 and 12 imply that EMT-ADT performs significantly better than the eight competitors over almost all the test instances.
Figure 10 plots the trajectory of the mean fitness versus the number of function evaluations in each algorithm on WCCI20-MSTO benchmark suite.As seen in Fig. 10, EMT-ADT demonstrates a clear advantage over eight competitors.

Comparisons on WCCI20-MaTSO
The mean fitness values obtained by nine algorithms on WCCI20-MaTSO are shown in Table 13

Comparisons with single-task algorithms
Multitasking optimization algorithm makes full use of positive knowledge based on sharing solutions across tasks, which can improve convergence and solution quality.To verify the competitiveness of the proposed algorithm, a series of experiments are conducted against two state-of-the-art single-task methods, i.e.SHADE and LSHADE.SHADE is an adaptive DE characterized by introducing successhistory based parameter adaptation, while LSHADE is an improvement of SHADE.The performance of EMT-ADT, SHADE and LSHADE on CEC2017 benchmark problems and WCCI20_MTSO benchmark problems are summarized in Tables 14 and 15.
As can be observed in Table 14, EMT-ADT achieves superior performance in terms of solution quality on almost all the CEC2017 benchmarks.More specifically, the proposed EMT-ADT wins 17 out of the 18 competitions, which demonstrates that the implicit knowledge transfer across tasks in multitasking is beneficial for improving the performance of EMT-ADT.From Table 15, it can be seen that EMT-ADT works well on more complex multitasking benchmark suits

Statistical analysis
In this section, the multiple-problem Wilcoxon's test is used to compare the significant difference (p value) between the competitor algorithm and EMT-ADT, while the Friedman test is used to rank the significance of the compared algorithms statistically.Table 16 shows that EMT-ADT provides higher R + values than MFEA, MFEARR, AT-MFEA, MTEA-AD, MFDE, TLTLA, SREMTO and MFMP.The p values of MFEA, MFEARR, AT-MFEA, MTEA-AD, SREMTO and MFMP are less than 0.05, which indicates that the proposed EMT-ADT considerably wins these competitors.For PI + HS, PI + MS and PI + LS problems, the p values of MFDE and TLTLA are greater than 0.05, which indicates that the performance of MFDE and TLTLA is similar to that of EMT-ADT on these problems.For NI + HS, NI + MS and NI + LS problems, the p value of TLTLA is greater than 0.05, which indicates that the performance of TLTLA is similar to that EMT-ADT on these problems.Furthermore, Table 17 shows the average rank and the overall rank of the compared algorithms for CEC2017 multitask problems.In sum, the average performance of EMT-ADT is first rank on all test problem.
Figure 11 visually shows the ranking of the EMT-ADT and competitor algorithms for CEC2017 multitask problems.The left side demonstrates the ranking of nine algorithms on CEC2017 multitask problems, while the right side illustrates

Population size sensitivity analysis
Population size has a considerable bearing on the rate of convergence.Generally, large population size may slow the convergence rate, while small population may promote a faster convergence [34].In the proposed algorithm, the parameter γ represents the dispersion degree of population convergence, which is used to control the population size.To analyze the effects of the parameter γ on the performance of EMT-ADT, firstly, the degree of dispersion (DOD) of the population is defined as follows.
where x i is the ith individual in the population P, and N is the population size.
Figure 12 shows the degree of dispersion obtained by EMT-ADT on CEC2017 multitask problems.As seen from Fig. 12, the dispersion of the population is different during the evolution process for different problems with different tasks.To determine the parameter γ for better performance of EMT-ADT, CI + MS, PI + MS and NI + MS problems are selected to test the performance of EMT-ADT with different values of the parameter γ , i.e. γ 0.0005, γ 0.001, γ 0.005, γ 0.01, γ 0.015, on all tasks.
Figure 13 and Table 18 respectively show the convergence curves and the ranking results on CI + MS, PI + MS and NI + MS problems with different values of the parameter γ .As can be seen in Fig. 13 and Table 18, the algorithm performs better with γ 0.001.Therefore, γ is set to 0.001 in the proposed algorithm.
As mentioned before, the population size is adjusted at γ 0.001, the algorithm performs the best.Since the dispersion of the population is different during the evolution process for different problems with different tasks, the mean dispersion degree of the population on nine CEC2017 multitask problems is calculated, as seen in Fig. 14.The location of red circle is recorded as (FES, γ ), which represents the time when the population is adjusted.Therefore, when FES is 50,000, that is FES/MAXFES cos(γ )/4, the population size should be adjusted to achieve better performance.Fig. 12 The degree of dispersion obtained by EMT-ADT on CEC2017 multitask problems

Results on real-world problems
In this section, two real-world problems called the traveling salesman problem (TSP) [35] and traveling repairman problem (TRP) [36] are used to verify the practicability of the proposed EMT-ADT algorithm.More specifically, TSP and TRP are constructed as a multitask optimization problem, which is asked for a tour with minimum cost.Given a list of n cities and a list of maintenance times for n cities, TSP aims to minimize the total time to visit these cities, while TRP aims to minimize the sum of the elapsed times for all customers that have to wait before being served [37].Each city is allowed to visit only once, and finally returns to the origin city.Given a distance matrix C {c(x i , x j )|i, j 1, 2, . . ., n}, where c(x i , x j ) is the distance between two cities x i and x j .Let x (x 1 , x 2 , . . ., x n ) to be a tour, i.e. a solution of the objective function.P(x 1 , x n ) represents the path from x 1 to x n .For the TSP problem, the total time ls( p(x 1 , x n )) from starting city x 1 to x n is computed as For the TRP problem, the total time lr( p(x 1 , x n )) from starting city x 1 to x n is computed as lr( p(x 1 , x n )) n−1 i 1 c(x i , x i+1 ) + i j 1 r j + c(x 1 , x n ) + r n (17) Generally, a unified representation scheme is used in multitask optimization algorithm, in which every solution is encoded by a random key between 0 and 1.Since the paths in TSP and TRP problems should be a set of integers without duplication, the real number encoding is converted to permutation number encoding.Specifically, all dimension values of the solution x are sorted in ascending order.Afterwards, the permutation solution x of solution x is obtained according to the ranking of dimension values.Figure 15 shows a simple example.For a solution x (0.21, 0.34, 0.84, 0.67, 0.11, 0.08, 0.55), its permutation solution x (3,4,7,6,2,1,5) denotes that the traveler starts from the third city, visits the fourth, seventh, sixth, second, first and fifth cities, and then returns to the third city.
In this experiment, we investigate the EMT-ADT, MFDE, MFPSO and the MFMP.Ten groups of problems are randomly selected from TSPLIB [38].Suppose travelers travel between cities at a speed of 60 km/h.Each group is composed of a TSP test case and a TRP test case.For each test  Permutation of City Number Fig. 15 An example of mapping a solution to a permutation of city number case, the average time spent by the traveler obtained by each algorithm on 30 independent runs is presented in Table 19.
number following the instance name indicates the number of city nodes.Table 19 shows that EMT-ADT achieves a tour with minimum cost against all peer competitors on almost all the instances.EMT-ADT wins 18 out of the 20 competitions, which demonstrates the superior performance of the proposed EMT-ADT.Figures 16 and 17 show the optimal TSP tour and the optimal TRP tour obtained by different algorithms on the eil51 instance, respectively.Experimental results show that EMT-ADT obtains better solutions than three state-of-the-art multitasking optimization algorithms.

Conclusions
To enhance the positive knowledge transfer between tasks, this paper proposed an evolutionary multitasking optimization algorithm with adaptive transfer strategy based on the decision tree (EMT-ADT).A method of quantifying the transfer ability of individuals is proposed to select individuals with high transfer ability.A decision tree is constructed to predict the transfer ability of individuals in the archive.Then, individuals with high transfer ability are selected to conduct knowledge transfer, which can improve the performance of the algorithm.The effectiveness of EMT-ADT is verified through a comparison with eight state-of-the-art algorithm, i.e.MFEA, MFEARR, AT-MFEA, MTEA-AD, As mentioned in [39], several specific actions are advocated to bright light to the field of evolutionary multitask optimization.For example, the computational complexity of evolutionary multitasking methods should be regarded as a metric for the evaluation of algorithms.Future work will focus on improving the effectiveness of knowledge transfer operations between tasks with low correlation, and reducing the computational complexity for multifactorial problems with more than two tasks.Moreover, EMT-ADT will be used to solve multi-objective multitask optimization problems.

Fig. 4
Fig. 4 Decision tree model construction for a two-task benchmark problem )-(3).When knowledge transfer is required during the generation of offspring, the training data D is first constructed by TP of continuous LP generation.Next, the decision tree model DT is constructed by the training data.Then, DT is used to predict the transfer ability of all individuals in P .All individuals are sorted based on the transfer ability.The top n-1 individuals and the historical best individual in P are stored into the archive A. Then, DT is used to predict the transfer ability of all individuals in the archive A. After ranking these individuals according to their transfer ability, the top n−1 individuals and the historical best individual in P are stored into the historical transferred population TP g+1 at generation g + 1.When there is no knowledge transfer during the generation of offspring, the top n−1 individuals with better factorial cost and the historical best individual in P are stored into the archive A. The individuals in the archive A are sorted according to their transfer ability.Then, the top n−1 individuals and the historical best individual in A are selected as the historical transferred population TP g+1 at generation g + 1.
is O(m • N P • D) at each iteration.For the adaptive knowledge transfer based on decision tree, the Euclidean distance between the transferred individual r i and the optimal transferred individual r best is calculated, which may increase the time complexity of the algorithm.Due to five consecutive iterations, the runtime complexity is O(5 • n • D), in which n is the number of the transferred individuals.Similarly, the runtime complex of historical transferred population update is O(n • D).Therefore, the total time complexity of the EMT-ADT is O(m • N P • D).

Fig. 8
Fig. 8 Convergence curves of the mean fitness values obtained by EMT-ADT and EMT on CI + MS, PI + MS and NI + MS

Fig. 9
Fig. 9 Curves of the mean number of the transferred individuals during evolution in EMT-ADT on three test problems.a CI-MS, b PI-MS, c NI-MS

Fig. 16
Fig. 16 The optimal TSP tour on the eil51 instance obtained by different algorithms.a EMT-ADT, b MFDE, c MFPSO, d MFMP

Fig. 17
Fig. 17 The optimal TRP tour on the eil51 instance obtained by different algorithms.a EMT-ADT, b MFDE, c MFPSO, d MFMP

Table 1
An instance of playing training dataset Fig.1Decision tree trained by Table1

Table 2
Transfer ability of the population TP

Table 3
Dataset D

Table 7
Default parameters settings of compared algorithms

Table 8
Mean fitness values and Std obtained by MFEA, MFEARR, AT-MFEA, MTEA-AD, MFDE, TLTLA, SREMTO, MFMP and EMT-ADT on complete intersection problems regard to the multiple-problem Wilcoxon's test, R + and R − are used to indicate that EMT-ADT is significantly better than or worse than the compared algorithm, respectively.The best solution is highlighted in bold.123

Table 8
shows that the proposed EMT-ADT algorithm outperforms MFEA, MFEARR, AT-MFEA, MTEA-AD, MFDE, TLTLA and SREMTO on complete intersection problems, which indicates that the decision tree model prediction strategy employed in the proposed algorithm work effectively and efficiently.More specially, EMT-ADT and MFMP have similar performance on CI + HS and CI + MS problems, while EMT-ADT performs better than MFMP on CI + LS problems.Furthermore, the experimental results show that the proposed EMT-ADT algorithm wins the MFEA, MFEARR, AT-MFEA, MTEA-AD, MFDE, TLTLA, SREMTO and MFMP on 6, 6, 6, 6, 6, 6, 6, 3 optimization tasks, respectively.The convergence curves of the mean fitness values obtained by each algorithm on complete intersection problems are plotted in Fig.

Table 10
. It is clear that EMT-ADT show obvious advantage over the competitors.More specifically, EMT-ADT significantly outperforms the eight competitors on 94 instances out of 100 instances.The

Table 16
Wilcoxon results for EMT-ADT on CEC2017 multitask problems