Introduction

In recent years, many researchers show great interest on a new category of optimization problems, which is called multifactorial optimization (MFO). Different from well-known optimization problems, such as single objective optimization and multiobjective optimization, MFO aims to handle multiple distinct optimization tasks in one run. Such problems widely exist in the fields of science, engineering and technology. For example, in a complex supply chain problem [1], two optimization problems, i.e. shop scheduling (production optimization) and vehicle routing (logistics optimization), are involved simultaneously. Gupta et al. [1] was the first to use evolutionary algorithm to deal with single objective MOP problems, which is called multifactorial evolutionary algorithm (MFEA). Subsequently, a series of multifactorial evolutionary algorithms are proposed and applied to deal with different optimization problems, such as job shop scheduling [2], shortest-path tree [3], ensemble classification [4] and multiobjective optimization [5].

MFEA is characterized by integrating cultural effects through assortative mating and vertical cultural transmission. In MFEA, a prescribed parameter called random mating probability (rmp) is used to control knowledge transfer during the optimization process. Due to lack of prior knowledge about intertask similarity, for related multiple optimization tasks, rmp can enhance the optimization performance, which is called positive transfer; on the contrary, the optimization performance may deteriorate for unrelated multiple optimization tasks, which is called negative transfer. Since knowledge transfer is very important to multifactorial evolutionary algorithms, researchers have developed many transfer strategies to alleviate negative knowledge transfer between unrelated tasks. These strategies can be divided into four categories:

  1. 1.

    Domain adaptation techniques: Bali et al. [6] proposed a linearized domain adaptation (LDA) strategy, which transforms the search space to improve the correlation between representative space and its constitutive task. Accordingly, an effective platform is provided for knowledge transfer. Wang et al. [7] developed an explicit autoencoding strategy to incorporate multiple search mechanisms with different biases in the evolutionary multitasking paradigm. The proposed autoencoder aims to learn a mapping between problem domains. Xue et al. [8] proposed an affine transformation-enhanced MFO algorithm (AT-MFEA) to enhance the transferability between distinct tasks. In AT-MFEA, a superior intertask mapping is obtained by the rank loss function. Moreover, the evolutionary-path-based representation model is established to bridge the gap between two distinct problems from different domains.

  2. 2.

    Adaptive strategy of parameter rmp: Ong et al. [9] proposed an online transfer parameter estimation strategy (MFEA-II) to minimize damage caused by negative transfer between tasks. In MFEA-II, the parameter rmp takes the form of a symmetric matrix instead of a scalar value. The constructed matrix can capture the non-uniform inter-task synergies even if the complementarity between tasks may not be uniform across different task-pairs. Moreover, the RMP matrix is continuously learned and adapted during the search process, which is helpful to obtain the global optimums of different tasks. Xu et al. [5] proposed a cultural transmission based multi-objective evolution strategy (CT-EMT-MOES), where an adaptive information transfer strategy is developed to adaptively adjust the parameter rmp. In detail, the proposed transfer strategy utilizes the mutation success rate of the target itself and the success rate of the information transfer to reasonably allocate the evolution resources between tasks. Li et al. [10] proposed an explicit multipopulation evolutionary framework (MPEF) to improve information transfer effects. In MPEF, the parameter rmp is adjusted based on the evolution status of the population. Specifically, if the ratio that offspring is superior to its parent is larger than the given threshold, the rmp will not be adjusted because the population evolves well with the current rmp. On the contrary, the rmp will be updated because the current rmp causes a negative transfer.

  3. 3.

    Intertask learning strategy: Da et al. [11] proposed a transfer evolutionary computation paradigm (AMTEA), which can reduce the risks of negative transfer via online source-target similarity learning. In AMTEA, a probabilistic model is constructed with the distribution of elite solutions from some source optimization task. Subsequently, the probabilistic model is used to provide a promising direction for the search on a related target task. Gao et al. [12] utilized semi-supervised learning strategy to enhance the effectiveness of knowledge transfer (EMT-SSC). In EMT-SSC, the promising individuals are identified with semi-supervised learning strategy. Then, these individuals transfer valuable knowledge between tasks. Zheng et al. [13] developed a self-regulated evolutionary multitask optimization (SREMTO) algorithm to dynamically adjust the intensity of knowledge transfer between tasks. In SREMTO, a task group is created based on the ability vectors of individuals. The degree of overlap between the task groups depends on the degree of tasks relatedness. The cross-task knowledge transfer is conducted through the overlapping parts between task groups.

  4. 4.

    Multi-knowledge transfer mechanism: Cai et al. [14] proposed a hybrid knowledge transfer strategy to conduct information transfer between tasks (EMTO-HKT). In EMTO-HKT, a multi-knowledge transfer mechanism including an individual-level learning strategy and a population-level learning strategy are used to transfer knowledge according to the degree of the task relatedness. Liang et al. [15] proposed a two-stage adaptive knowledge transfer mechanism. At the first stage, the search step of each individual is adjusted to alleviate the negative transfer, while at the second stage, the search range of each individual is adjusted to improve the exploration ability of the population. Ding et al. [16] proposed a generalized multitasking evolutionary optimization for expensive problems (G-MFEA). In G-MFEA, two strategies are proposed to conduct knowledge transfer between optimization problems with different locations of the optimums and different numbers of decision variables.

Although existing MFEAs endeavor to alleviate negative transfer during the optimization process, the solution precision obtained by these algorithms is not satisfactory, especially for those multitasking problems with low relatedness. Further, individuals with useless knowledge for other tasks are often transferred due to the lack of prior information about the relatedness between tasks, which obviously results in a waste of resources. To solve these problems, this paper presents an evolutionary multitasking optimization algorithm with adaptive transfer strategy based on the decision tree (EMT-ADT). In EMT-ADT, the decision tree based on Gini coefficient is constructed to predict the individual transfer ability. To the best of our knowledge, this is the first attempt in the literature to use the decision tree to enhance positive knowledge transfer in the MFO paradigm. The primary contributions of this paper can be summarized as follows:

  1. 1.

    The transfer ability of individuals is defined to quantify the amount of useful knowledge contained in the transferred individuals. Individuals with high transfer ability are used to construct a decision tree.

  2. 2.

    Combine with a knowledge of supervised machine learning, the proposed algorithm uses decision tree to predict the positive-transferred individuals. By selecting promising positive-transferred individuals, the proposed algorithm can improve the probability of a positive transfer.

  3. 3.

    The success-history based adaptive differential evolution algorithm (SHADE) is used as the search engine, which can demonstrate the generality of the MFO paradigm. Three multifactorial optimization benchmark sets are used to verify the competitiveness of the proposed method.

The rest of this paper is organized as follows: the next section introduces the details of MFO and decision tree model. The next section describes the proposed EMT-ADT algorithm. The following section presents the experimental results on three multifactorial optimization (MFO) benchmark sets and two combinatorial optimization problems (TSP and TRP). The conclusion and future work are summarized in the last section.

MFO and decision tree

Multifactorial optimization

As mentioned before, multifactorial optimization (MFO) is an evolutionary multitasking paradigm that aims to find a group of optimal solutions simultaneously, each of which corresponds to an optimization problem. To compare individuals in a multitasking environment conveniently, it is necessary to encode and decode different individuals in a unified search space. For an unconstrained multitasking problem with n tasks, Tj denotes the jth task with a search space Xj and an objective function fj. For the ith individual pi, its properties are defined as follows [1]:

Definition 1

The factorial cost \({\Psi }_{j}^{i}\) of individual pi on task Tj is the objective value \({f}_{j}^{i}\) of individual pi.

Definition 2

The factorial rank \({r}_{j}^{i}\) of individual pi on task Tj is the index of pi, provided that the population is sorted in ascending order according to \({\Psi }_{j}\).

Definition 3

The scalar fitness of individual pi is defined as \({\mathrm{\varphi }}_{i}=1/\underset{\mathit{j\epsilon }\{1,\dots ,n\}}{\mathrm{min}}\{{r}_{j}^{i}\}\).

Definition 4

The skill factor \({\tau }_{i}\) of individual pi is defined as \({\tau }_{i}={\mathrm{argmin}}_{j\epsilon \{1,\dots ,n\}}\{{r}_{j}^{i}\}\). In other words, \({\tau }_{i}\) denotes the index of the task that individual pi performs the best among all other tasks.

MFEA is a pioneer evolutionary algorithm to realize the MFO paradigm, which transfers genes and memes through assortative mating and vertical cultural operation. Algorithm 1 presents the basic framework of MFEA.

figure a

Decision tree

The decision tree is one of the supervised machine learning method to multistage decision making [17]. A tree structure is used to present the decision rules summarized from a series of data with characteristics and labels. Generally, a decision tree consists of a root node, some internal nodes and leaf nodes. Leaf nodes are nodes that have no appropriate descendants, while other nodes (except the root) are called internal nodes. Each internal node is associated with a test attribute, each branch represents the outcome of the test, while each leaf node assigns one or more class labels [18, 19]. A path from the root node to each leaf node corresponds to a decision test sequence.

The design of a decision tree mainly includes three tasks [17].

Task 1: Select a node splitting rule.

Task 2: Decide which nodes are terminal.

Task 3: Assign each terminal node to a class label.

Taking the playing dataset as an example, Table 1 shows an instance of playing training dataset, which has fourteen samples. Each sample consists of ID, four attributes and one category label. The attribute consists of weather (a1), temperature (a2), humidity (a3) and whether there is wind (a4). The category label (L) is whether to play today.

Table 1 An instance of playing training dataset

Figure 1 shows the decision tree trained by Table 1. ni represents the ith node.

Fig. 1
figure 1

Decision tree trained by Table 1

Proposed method

Motivations

Positive knowledge transfer has a significant effect on the performance of multifactorial evolutionary algorithm. In MFEA, the knowledge transfer among tasks is controlled by the parameter rmp. In each task, since the solutions are randomly selected to exchange information based on the same probability, there is a possibility that the transfer turns out to be negative, thereby leading to deterioration of algorithm performance [20, 21]. Therefore, in the case of uncertain correlation between tasks, how to accurately select valuable solutions is core to improve the performance of the algorithm. To solve these problems, this paper proposed an evaluation rule to quantify the transfer ability of individuals involved in knowledge transfer.

Furthermore, the research shows that the decision tree model has the following advantages in the construction process and data processing. First of all, the decision tree uses the knowledge learned from training to directly form a hierarchical structure, which is readable and easy to understand and implement. Secondly, the decision tree is suitable for data sets with small size, and the time complexity of the decision tree algorithm is small. Finally, the decision tree is not sensitive to missing values during data processing. In addition, the decision tree can deal with irrelevant feature data, and can accurately predict the results of analysis with large data sources in a relatively short time. Therefore, in the proposed EMT-ADT algorithm, the decision tree is constructed according to the information of transferred individuals. The number of individuals for each knowledge transfer is set to 10, and five consecutive generations of transferred individuals are used as the training data to construct the decision tree model, which conforms to the characteristics of the decision tree. Based on the constructed decision tree, promising positive-transferred individuals are selected to conduct knowledge transfer between tasks, which can achieve a fast convergence and improve the solution accuracy.

Definition of transfer ability

For a population \(P=\{{\mathbf{x}}_{1},{\mathbf{x}}_{2},\dots ,{\mathbf{x}}_{N}\}\), an archive \(\mathrm{A}=\{{\mathbf{t}}_{1},{\mathbf{t}}_{2},\dots ,{\mathbf{t}}_{N}\}\) is used to store individuals involved in knowledge transfer; the historical transferred population \(\mathrm{TP}=\{{\mathbf{t}}_{i}|1\le i\le n\wedge\upomega \left({\mathbf{t}}_{i}\right)>\upomega \left({\mathbf{t}}_{n+1}\right)\}\) is used to store n individuals with the highest knowledge transfer ability in the archive A. Each \({\mathbf{t}}_{i}\) has an associated subset \({\Phi }_{i}\).

For each \({\mathbf{t}}_{i}\in TP\), if \({\mathbf{t}}_{i}\) participates in the generation of offspring \({\mathbf{y}}_{j}\) (recorded as \({\mathbf{t}}_{i}\to {\mathbf{y}}_{j}\)), the associated subset \({\Phi }_{i}\) is defined as follows.

$${\Phi }_{i}=\{{\mathbf{y}}_{j}|{\mathbf{t}}_{i}\to {\mathbf{y}}_{j},j=\mathrm{1,2},\dots ,N\}$$
(1)

Accordingly, the transfer amount \({\lambda }_{i,j}\) of the jth individual in \({\Phi }_{i}\) is defined as follows:

$${\lambda }_{i,j}=\left\{\begin{array}{l}1\quad if\quad f\left({\mathbf{y}}_{j}\right)<f({\mathbf{x}}_{j})\\ 0\quad otherwise\end{array}\right.$$
(2)

where xj is the parent of the offspring yj.

The transfer ability \(\upomega \left({t}_{i}\right)\) of transferred individual \({t}_{i}\) is defined as follows, where u represents the size of subset \({\Phi }_{i}\).

$$\upomega \left({t}_{i}\right)=\sum\limits_{j=1}^{u}{\lambda }_{i,j}$$
(3)

Figure 2 shows an example of the calculation of the transfer ability, where all individuals are in a 2-D decision space.

Fig. 2
figure 2

An instance of the calculation of the transfer ability

In Fig. 2, the historical transferred population TP = {t1,t2,t3,t4,t5} is composed of five transferred individuals. The performance of the offspring y1 and y4 is better than that of the parent, while the performance of the offspring y2 and y3 is worse than that of the parent. t1 and t3 participate in the generation of offspring y1. t2 and t4 participate in the generation of offspring y2. t4 and t5 participate in the generation of offspring y3. t2 and t3 participate in the generation of offspring y4. Since t1 only participates in the generation of offspring y1, and y1 is superior to the parent, according to Eqs. (1) and (2), the associated subset of t1 is recorded as \({\Phi }_{1}=\{{\mathbf{y}}_{1}\}\); the transfer amount of t1 is \({\lambda }_{\mathrm{1,1}}=1\). According to Eq. (3), the transfer ability of t1 is recorded as \(\upomega \left({\mathbf{t}}_{1}\right)={\lambda }_{\mathrm{1,1}}=1\). Similarly, t2 participates in the generation of the offspring y2 and y4. y2 is inferior to the parent, while y4 is superior to the parent, then the associated subset of t2 is recorded as \({\Phi }_{2}=\{{\mathbf{y}}_{2},{\mathbf{y}}_{4}\}\), \({\lambda }_{\mathrm{2,1}}=0\), \({\lambda }_{\mathrm{2,2}}=1\). According to Eq. (3), the transfer ability of t2 is \(\upomega \left({\mathbf{t}}_{2}\right)={\lambda }_{\mathrm{2,1}}+{\lambda }_{\mathrm{2,2}}=1\). t3 participates in the generation of the offspring y1 and y4. The associated subset of t3 is recorded as \({\Phi }_{3}=\{{\mathbf{y}}_{1},{\mathbf{y}}_{4}\}\). Since the offspring y1 and y4 outperform their parents, then \({\lambda }_{\mathrm{3,1}}=1\), \({\lambda }_{\mathrm{3,2}}=1\). The transfer ability of t2 is \(\upomega \left({\mathbf{t}}_{3}\right)={\lambda }_{\mathrm{3,1}}+{\lambda }_{\mathrm{3,2}}=2\). \(\upomega \left({\mathbf{t}}_{4}\right)\) and \(\upomega \left({\mathbf{t}}_{5}\right)\) are calculated in the same way. Table 2 shows the result of transfer ability of the historical transferred population TP.

Table 2 Transfer ability of the population TP

Construction of decision tree

In the proposed EMT-ADT algorithm, the archive A is used to store individuals involved in knowledge transfer. At each generation, n individuals selected from the auxiliary population are used to update the archive. For a multitask problem with two tasks, when optimizing task T1, the population associated with task T2 is regarded as an auxiliary population and vice versa. Since it is possible to select individuals with negative transfer by using random selection strategy, a decision tree model classifier is constructed to predict the transfer ability of each individual in the auxiliary population. Subsequently, transferred individuals are sorted in descending order according to transfer ability. The top n transferred individuals with high transfer ability are chosen to be placed into the archive A. Furthermore, the decision tree model classifier is used to predict the transfer ability of individuals in the archive A. The top n transferred individuals with high transfer ability are selected as the historical transferred population TP to conduct knowledge transfer for the next generation.

At each generation, the historical transferred population TP with LP generations is used as the training data, that is \(\mathrm{TP}\_\mathrm{DT}={\bigcup }_{g=G-LP}^{G-1}{TP}_{g}\). The process of the decision tree model classifier construction is described as follows.

1. According to individual transfer ability, the individual with the highest transfer ability in TP_DT is recorded as tbest.

$${\mathbf{t}}_{best}= \underset{i\in \{\mathrm{1,2},\dots ,m\}}{\mathrm{argmax}}\upomega \left({\mathbf{t}}_{i}\right)$$
(4)

where m is the size of TP_DT.

2. For \(\forall {\mathbf{t}}_{i}\in TP\_DT\), it has two feature attributes, Euclidean distance attribute (a1) and factorial cost attribute (a2). The Euclidean distance dis(ti, tbest) between ti and tbest, which is recorded as the data xi,1 of a1, is put into the set Ω1. The factorial cost f(ti) of individual ti, which is recorded as the data xi,2 of a2, is put into the set Ω2. The transfer ability \(\upomega \left({\mathbf{t}}_{i}\right)\) of individual ti, which is recorded as the data yi of label attribute, is put into the set Ω3. Let \({\mathrm{x}}_{i}={\mathrm{x}}_{i,1}\cup {\mathrm{x}}_{i,2}\), the candidate attribute set A = {a1,a2}, the dataset \(\mathrm{D}=\{\left({\mathrm{x}}_{1},{\mathrm{y}}_{1}\right),\left({\mathrm{x}}_{2},{\mathrm{y}}_{2}\right),\dots ,\left({\mathrm{x}}_{m},{\mathrm{y}}_{m}\right)\}\).

$$\mathrm{dis}\left({\mathbf{t}}_{i},{\mathbf{t}}_{best}\right)={\Vert {\mathbf{t}}_{i}-{\mathbf{t}}_{best}\Vert }_{2}$$
(5)

3. Calculate the Gini index of each feature attribute with the sets Ω1, Ω2 and Ω3, where pk represents the proportion of the current category k.

$$\mathrm{Gini}\left({a}_{i}\right)=1-\sum\limits_{k=1}^{K}{p}_{k}^{2}$$
(6)

4. Calculate the Gini index of the jth splitting value bj corresponding to the feature attribute ai, where v represents the category for the label attribute in dataset D.

$$\mathrm{Gini}\_\mathrm{index}({a}_{i},{b}_{j})=1-\sum\limits_{v=1}^{V}\frac{\left|{D}^{v}\right|}{\left|D\right|}Gini({D}^{v})$$
(7)

5. The splitting value b* with the lowest Gini index is selected as the optimal splitting attribute. In the candidate attribute set A, the feature attribute a* corresponding to b* is selected as the current node.

$${b}^{*}=\mathrm{argmin \,Gini}\_\mathrm{index}({a}_{i},{b}_{j})$$
(8)

Algorithm 2 presents the construction of decision tree.

figure b

Take Table 3 as an example to construct a decision tree. Euclidean distance (denoted as dis) represents the first feature attribute, while factorial cost (denoted as f) represents the second feature attribute. Firstly, determine the root node. Since the attribute dis is a numerical attribute, sort the data in ascending order. Then, the samples are split into two groups with the middle value of adjacent values from small to large. Significantly, two adjacent values are different. For example, if dis = 0.2 and dis = 0.4, the median value is 0.3. Then, the median value is used as the split point, the calculated Gini index is 0.619. Similarly, other median values and Gini indexes can also be calculated in the same way, as shown in Table 4. In Table 4, the Gini index obtained by taking 0.95 as the split point is the smallest, which is 0.32. Table 5 shows the Gini index of different split points with the factorial cost as node. In Table 5, the Gini index obtained by taking 17.5 as the split point is the smallest, which is 0.441. As seen from Tables 4 and 5, since the Gini index 0.32 in Table 4 is less than the Gini index 0.441 in Table 5, the Euclidean distance is used as the split attribute of the root node, and 0.95 is used as the split value to construct the decision tree. After calculation, the factorial cost is used as the split attribute of intermediate node on the second level, and 10.5 is used as the split value to construct the decision tree. Finally, the constructed decision tree is shown in Fig. 3.

Table 3 Dataset D
Table 4 Gini index for feature attribute Euclidean distance
Table 5 Gini index for feature attribute factorial cost
Fig. 3
figure 3

Decision tree model construction

In Fig. 3, the leaf node represents the transfer ability. x1 represents the first feature attribute: Euclidean distance, while x2 represents the second feature attribute: factorial cost. Take the transfer ability prediction of data (0.4, 6) as an example, firstly, we judge at the root node. Since 0.4 is less than 0.95, we turn to the left subtree to judge. At the intermediate node, 6 is less than 10.5, the data transfer ability is predicted to be 2.

For further explain the prediction of individual transfer ability, a two-task benchmark problem including Griewank problem and Rastrigin problem is selected. Both problems are characterized by multimodal and nonseparable. The detailed properties are shown as follows.

(1) Rastrgin:

$$\mathrm{F}\left(\mathrm{x}\right)=\sum\limits_{i=1}^{D}({x}_{i}^{2}-10\mathrm{cos}\left(2\pi {x}_{i}\right)+10), x\in {\left[\mathrm{50,50}\right]}^{D},D=50$$
(9)

(2) Griewank:

$$\mathrm{F}\left(\mathrm{x}\right)=1+\frac{1}{4000}\sum\limits_{i=1}^{D}{x}_{i}^{2}-\prod_{i=1}^{D}\mathrm{cos}\left(\frac{{x}_{i}}{\sqrt{i}}\right), x\in {\left[\mathrm{100,100}\right]}^{D}, D=50$$
(10)

Figure 4 shows the decision tree model constructed by the proposed algorithm EMT-ADT at generation g on the two-task benchmark problem (Eqs. (9), (10)). At this time, there are six cases of transfer ability (2, 3, 4, 5, 6, 7). Take the transfer ability prediction of data (2.794, 18,694) as an example, firstly, we judge at the root node. Since 2.794 is less than 2.96472, we turn to the left subtree to judge. Similarly, 2.794 is less than 2.95378, we continue to judge on the root node of the left subtree of this node. Since 2.794 is greater than 1.08146, we judge on the root node of the right subtree of this node. Similarly, 2.794 is greater than 2.75214, we continue to judge on the root node of the right subtree of this node. Finally, 18,694 is less than 18,869.9, the transfer ability prediction result of this data is 4.

Fig. 4
figure 4

Decision tree model construction for a two-task benchmark problem

The transferability of an individual is defined as follows. If an auxiliary parent participates in the generation of an offspring for m times, and there are n offspring that are superior to the corresponding parent, then the transfer ability of the auxiliary parent is n. As seen in Fig. 4, take the leaf node marked in red circle as an example. The node in red circle indicates that an individual reaches the position of the current leaf node through decision tree classification, its corresponding transferability is 7.

Search strategy

Many classical optimization algorithms, such as GA (genetic algorithm), DE (differential evolution), PSO (particle swarm optimization), TLBO (teaching–learning-based optimization), and BSO (brain storm algorithm) can be used as search engine in MFO paradigm [22,23,24,25]. Different algorithms have different search performance. Obviously, well designed search strategies can improve the search efficiency. The success-history based adaptive differential evolution algorithm (SHADE) proposed by Tanabe et al. [26] has been proved to be an effective optimization algorithm. In SHADE, a historical memory is used to store the control parameters that performed well during the evolution. New control parameters are generated by sampling the parameters in the historical memory, which may further improve the performance of the algorithm. Since SHADE outperformed many state-of-the-art DE algorithms on CEC2013 benchmark set and CEC2005 benchmarks, this paper selects SHADE as the search engine. The mutation operator is defined as follows [26].

$${\mathbf{v}}_{i}={\mathbf{x}}_{i}+{F}_{i}\left({\mathbf{x}}_{pbest}-{\mathbf{x}}_{i}\right)+{F}_{i}({\mathbf{x}}_{r1}-{\widetilde{\mathbf{x}}}_{r2})$$
(11)

where xi is the ith individual of the current population. xpbest is randomly selected from the top p% individuals in the current population, while xr1 is randomly selected from the current population.\({\widetilde{\mathbf{x}}}_{r2}\) is randomly selected from the union of the current population and the archive. The details of SHADE can be found in [26].

To improve the knowledge transfer between different tasks on MFO problems, the original mutation operator of SHADE is modified and is defined as follows.

$${\mathbf{v}}_{i}={\mathbf{x}}_{i}+{F}_{i}\left({\mathbf{x}}_{pbest}-{\mathbf{x}}_{i}\right)+{F}_{i}({\mathbf{x}}_{r1}^{{\prime}}-{\mathbf{x}}_{r2}^{{\prime}})$$
(12)

For a multitask problem with two tasks, P is the subpopulation corresponding to the target task, and \({P}^{{\prime}}\) is the subpopulation corresponding to the auxiliary task. TP is the historical transferred population, which is used to provide the transferred individuals for the target task. xi is the ith individual of the population P. xpbest is randomly selected from the top p% individuals in the population \({P}^{{\prime}}\). \({\mathbf{x}}_{r1}^{{\prime}}\) and \({\mathbf{x}}_{r2}^{{\prime}}\) are randomly selected from TP. F is the scale factor. After mutation operator, the same crossover operator as in SHADE is used to generate the final offspring.

In the modified SHADE mutation operator, xpbest can provide the promising direction for the population, which can promote the convergence. The randomly constructed difference vector \(({\mathbf{x}}_{r1}^{{\prime}}-{\mathbf{x}}_{r2}^{{\prime}})\) not only enhances the population diversity, but also promote positive knowledge transfer due to the selection of the transferred individuals with highest transfer ability.

The proposed EMT-ADT

The pseudocode of the EMT-ADT algorithm is shown in Algorithm 3. \({P}^{{\prime}}\) is the subpopulation corresponding to the auxiliary task. The archive A is used to store individuals involved in knowledge transfer. xbest is the best individual of \({P}^{{\prime}}\). NPi is the population size of ith task. FES and MAXFES represent the current number of function evaluations and the maximal number of function evaluations, respectively.

The main steps of EMT-ADT is as follows. First, for each task Ti, the offspring is generated by Eq. (11) or (12) according to the random mating probability rmp. Meanwhile, the associated subset \({\Phi }_{i}\), the transfer amount \({\lambda }_{i,j}\) and the transfer ability \(\upomega \left({t}_{i}\right)\) are calculated according to Eqs. (1)–(3). When knowledge transfer is required during the generation of offspring, the training data D is first constructed by TP of continuous LP generation. Next, the decision tree model DT is constructed by the training data. Then, DT is used to predict the transfer ability of all individuals in \({P}^{{\prime}}\). All individuals are sorted based on the transfer ability. The top n-1 individuals and the historical best individual in \({P}^{{\prime}}\) are stored into the archive A. Then, DT is used to predict the transfer ability of all individuals in the archive A. After ranking these individuals according to their transfer ability, the top n−1 individuals and the historical best individual in \({P}^{{\prime}}\) are stored into the historical transferred population TPg+1 at generation g + 1. When there is no knowledge transfer during the generation of offspring, the top n−1 individuals with better factorial cost and the historical best individual in \({P}^{{\prime}}\) are stored into the archive A. The individuals in the archive A are sorted according to their transfer ability. Then, the top n−1 individuals and the historical best individual in A are selected as the historical transferred population TPg+1 at generation g + 1.

When the success rate sri (sri denotes the rate that offspring is better than its parent) is greater than the given threshold, the random mating probability rmpi for each task is updated as follows [10].

$$ rmp_{i} = \left\{ {\begin{array}{*{20}l} {min\{ rmp_{i} + 0.3\left( {1 - sr_{i} } \right),1]\} \quad if\;TNP = 0} \\ {\left\lfloor {\frac{{tsr_{i} }}{{tsr_{i} + sr_{i} }} + 0.5} \right\rfloor \times {\text{min}}\left\{ {rmp_{i} + 0.3 \times tsr_{i} ,1} \right\}} \\ \quad { + \left\lfloor {\frac{{sr_{i} }}{{tsr_{i} + sr_{i} }} + 0.5} \right\rfloor \times max\{ rmp_{i} - 0.3 \times (1 - tsr_{i} ),0\} \quad otherwise} \\ \end{array} } \right., $$
(13)

where tsri denotes the rate that offspring generated with knowledge transfer is better than its parent. If all offspring are generated without knowledge transfer, then TNP is set to 0; otherwise, TNP is the number of offspring generated with knowledge transfer.

figure c

Complexity analysis

The computational cost of the EMT-ADT mainly comes from assortative mating, the adaptive knowledge transfer based on decision tree and historical transferred population update. In EMT-ADT, a loop over NP (population size) is conducted, containing a loop over D (dimension) and m optimization tasks. Assortative mating is performed according to the random mating probability (rmp). Then, the runtime complexity is \(O(m\cdot NP\cdot D)\) at each iteration. For the adaptive knowledge transfer based on decision tree, the Euclidean distance between the transferred individual ri and the optimal transferred individual rbest is calculated, which may increase the time complexity of the algorithm. Due to five consecutive iterations, the runtime complexity is \(O(5\cdot n\cdot D)\), in which n is the number of the transferred individuals. Similarly, the runtime complex of historical transferred population update is \(O(n\cdot D)\). Therefore, the total time complexity of the EMT-ADT is \(O(m\cdot NP\cdot D)\).

Comparative studies of experiments

To evaluate the competitiveness of the proposed EMT-ADT algorithm, 9 benchmark test problems from the CEC2017 Evolutionary Multi-Task Optimization Competition [27] are employed. Each test problem is a two-task problem. According to the similarity between the landscapes of two tasks, the benchmark problems can be categorized into three groups: high similarity (HS), medium similarity (MS) and low similarity (LS). Furthermore, the benchmark problems can be divided into three groups according to the intersection degree of the global optima: complete intersection (CI), partial intersection (PI) and no intersection (NI). The details of these benchmark problems can be found in [27]. In addition, two complex single-objective MFO test suites, i.e. WCCI20-MTSO and WCCI20-MaTSO, are selected to further verity the competitiveness of the proposed EMT-ADT. Both WCCI20-MTSO and WCCI20-MaTSO include 10 test problems, which are put forward for WCCI 2020 Competition on Evolutionary Multitasking Optimization [28]. Each test problem in WCCI20-MTSO is with 2 tasks, while each test problem in WCCI20-MaTSO is with 10 tasks.

Parameter settings

The aforementioned CEC2017 multitask problems are given in Table 6.

Table 6 Properties of problem pairs for CEC2017 multitask problems

The proposed EMT-ADT is compared with eight popular multitask optimization algorithms, namely MFEA [1], MFEARR (MFEA with resource reallocation) [29], MFDE (multifactorial differential evolution) [23], AT-MFEA (affine transformation-enhanced MFEA) [8], SREMTO [13], MFMP (MFO via explicit multipopulation evolutionary framework) [10], TLTLA (two-level transfer learning algorithm) [30] and MTEA-AD (MTEA based on anomaly detection) [31]. The default parameter settings for these algorithms are given in Table 7. N denotes the population size.

Table 7 Default parameters settings of compared algorithms

Experiments on CEC2017 multitask problems

In this section, the proposed EMT-ADT algorithm is compared with the above-listed algorithms to verify the performance. Tables 8, 9, 10 summarize the mean fitness (mean) and standard deviation (Std) achieved by MFEA, MFEARR, MFDE, AT-MFEA, SREMTO, MFMP, TLTLA, MTEA-AD and EMT-ADT over 30 runs. The codes are conducted with MAXFES as the termination criterion, which is set to 200,000. Three statistical test measures including the Wilcoxon signed-rank test [32], the multiple-problem Wilcoxon’s test [33] and Friedman’s test [33] are used to compare EMT-ADT with other eight algorithms. With regard to the single-problem Wilcoxon’s test, “†”, “≈” and “−” are used to indicate that EMT-ADT significantly wins, equal, and is worse than the compared algorithm, respectively. With regard to the multiple-problem Wilcoxon’s test, R+ and R are used to indicate that EMT-ADT is significantly better than or worse than the compared algorithm, respectively. The best solution is highlighted in bold.

Table 8 Mean fitness values and Std obtained by MFEA, MFEARR, AT-MFEA, MTEA-AD, MFDE, TLTLA, SREMTO, MFMP and EMT-ADT on complete intersection problems
Table 9 Mean fitness values and Std obtained by MFEA, MFEARR, AT-MFEA, MTEA-AD, MFDE, TLTLA, SREMTO, MFMP and EMT-ADT on partial intersection problems
Table 10 Mean fitness values and Std obtained by MFEA, MFEARR, AT-MFEA, MTEA-AD, MFDE, TLTLA, SREMTO, MFMP and EMT-ADT on no intersection problems

Comparisons on complete intersection problems

Table 8 shows that the proposed EMT-ADT algorithm outperforms MFEA, MFEARR, AT-MFEA, MTEA-AD, MFDE, TLTLA and SREMTO on complete intersection problems, which indicates that the decision tree model prediction strategy employed in the proposed algorithm work effectively and efficiently. More specially, EMT-ADT and MFMP have similar performance on CI + HS and CI + MS problems, while EMT-ADT performs better than MFMP on CI + LS problems. Furthermore, the experimental results show that the proposed EMT-ADT algorithm wins the MFEA, MFEARR, AT-MFEA, MTEA-AD, MFDE, TLTLA, SREMTO and MFMP on 6, 6, 6, 6, 6, 6, 6, 3 optimization tasks, respectively. The convergence curves of the mean fitness values obtained by each algorithm on complete intersection problems are plotted in Fig. 5. As can be seen in Fig. 5, EMT-ADT algorithm converges faster than the competitive algorithms.

Fig. 5
figure 5

Convergence curves of the average fitness values obtained by MFEA, MFEARR, AT-MFEA, MTEA-AD, MFDE, TLTLA, SREMTO, MFMP and EMT-ADT on complete intersection problems

Comparisons on partial intersection problems

As can be observed in Table 9, the proposed EMT-ADT algorithm outperforms the MFEA, MFEARR, AT-MFEA, MTEA-AD, MFDE, TLTLA, SREMTO and MFMP in 4 out of 6 tasks. TLTLA performs best on task T1 of PI + HS problem. For the task T2 of PI + LS problem, the performance of EMT-ADT is similar to that of MFMP. All in all, the results show that the proposed EMT-ADT outperforms the MFEA, MFEARR, AT-MFEA, MTEA-AD, MFDE, TLTLA, SREMTO and MFMP on 6, 6, 6, 6, 5, 5, 6, 5 optimization tasks, respectively. Figure 6 shows the convergence curves of the mean fitness values obtained by each algorithm on partial intersection problems. As can be seen in Fig. 6, compared with the eight competitive algorithms, EMT-ADT algorithm converges faster on all tasks except task T1 of PI + HS problem.

Fig. 6
figure 6

Convergence curves of the average fitness values obtained by MFEA, MFEARR, AT-MFEA, MTEA-AD, MFDE, TLTLA, SREMTO, MFMP and EMT-ADT on partial intersection problems

Comparisons on no intersection problems

Table 10 shows that the proposed EMT-ADT algorithm outperforms the MFEA, MFEARR, AT-MFEA, MTEA-AD, MFDE, TLTLA, SREMTO and MFMP in 4 out of 6 tasks. For task T2 of NI-MS problem and task T1 of NI-LS, TLTLA ranked first while EMT-ADT ranked second. The results show that the proposed EMT-ADT outperforms the MFEA, MFEARR, AT-MFEA, MTEA-AD, MFDE, TLTLA, SREMTO and MFMP on 6, 6, 6, 6, 6, 4, 6, 6 optimization tasks, respectively. Figure 7 shows the convergence curves of the mean fitness values obtained by each algorithm on no intersection problems. As can be seen in Fig. 7, compared with the eight competitive algorithms, EMT-ADT algorithm converges faster on all tasks except task T2 of NI + MS problem and task T1 of NI + LS problem.

Fig. 7
figure 7

Convergence curves of the average fitness values obtained by MFEA, MFEARR, AT-MFEA, MTEA-AD, MFDE, TLTLA, SREMTO, MFMP and EMT-ADT on no intersection problems

Adaptive knowledge transfer analysis

To evaluate the effectiveness of the proposed adaptive knowledge transfer, EMT-ADT is compared with the algorithm without adaptive knowledge transfer (EMT) on 3 representative CEC2017 multitask problems including CI + MS, PI + MS and NI + MS. As explained before, if the offspring generated with adaptive knowledge transfer has better performance than its parent, it is referred to as positive transfer. Figure 8 shows the convergence curves of the mean fitness values obtained by EMT-ADT and EMT on test problems CI + MS, PI + MS and NI + MS. Figure 9 shows the mean number of the transferred individuals during evolution in EMT-ADT on test problems CI + MS, PI + MS and NI + MS. In Fig. 9, red circles and blue stars denote the number of transferred individuals in the process of evolution on task T1 and task T2, respectively. As seen in Figs. 8 and 9, EMT-ADT outperforms EMT in terms of the solution quality, which demonstrates that adaptive knowledge transfer strategy can improve the performance of the algorithm.

Fig. 8
figure 8

Convergence curves of the mean fitness values obtained by EMT-ADT and EMT on CI + MS, PI + MS and NI + MS

Fig. 9
figure 9

Curves of the mean number of the transferred individuals during evolution in EMT-ADT on three test problems. a CI-MS, b PI-MS, c NI-MS

Comparisons on WCCI20-MSTO

To investigate the performance on more complex multitasking problems, the proposed EMT-ADT is further compared with eight state-of-the-art multitasking optimization algorithms on WCCI20-MSTO benchmark suite. Tables 11 and 12 show the comparative results of nine algorithms in terms of mean fitness.

Table 11 Mean fitness values obtained by MFEA, MFEARR, AT-MFEA, MTEA-AD and EMT-ADT on WCCI20-MTSO
Table 12 Mean fitness values obtained by MFDE, TLTLA, SREMTO, MFMP and EMT-ADT on WCCI20-MTSO

As can be seen from Tables 11 and 12, EMT-ADT obtained 17 best values out of 20 WCCI20-MTSO test instances. MFEA, MFEARR and MTEA-AD perform well on problem P8. SREMTO achieves the best value on the Task T1 of problem P9. The last rows of Tables 11 and 12 imply that EMT-ADT performs significantly better than the eight competitors over almost all the test instances.

Figure 10 plots the trajectory of the mean fitness versus the number of function evaluations in each algorithm on WCCI20-MSTO benchmark suite. As seen in Fig. 10, EMT-ADT demonstrates a clear advantage over eight competitors.

Fig. 10
figure 10figure 10figure 10

Convergence curves of the average fitness values obtained by MFEA, MFEARR, AT-MFEA, MTEA-AD, MFDE, TLTLA, SREMTO, MFMP and EMT-ADT on WCCI20-MTSO

Comparisons on WCCI20-MaTSO

The mean fitness values obtained by nine algorithms on WCCI20-MaTSO are shown in Table 13. It is clear that EMT-ADT show obvious advantage over the competitors. More specifically, EMT-ADT significantly outperforms the eight competitors on 94 instances out of 100 instances. The last row of Table 13 shows that the proposed EMT-ADT outperforms the MFEA, MFEARR, AT-MFEA, MTEA-AD, MFDE, TLTLA, SREMTO and MFMP on 94, 94, 95, 94, 100, 94, 94 and 100 optimization tasks, respectively.

Table 13 Mean fitness values obtained by MFEA, MFEARR, AT-MFEA, MTEA-AD, MFDE, TLTLA, SREMTO, MFMP and EMT-ADT on WCCI20-MaTSO

Comparisons with single-task algorithms

Multitasking optimization algorithm makes full use of positive knowledge based on sharing solutions across tasks, which can improve convergence and solution quality. To verify the competitiveness of the proposed algorithm, a series of experiments are conducted against two state-of-the-art single-task methods, i.e. SHADE and LSHADE. SHADE is an adaptive DE characterized by introducing success-history based parameter adaptation, while LSHADE is an improvement of SHADE. The performance of EMT-ADT, SHADE and LSHADE on CEC2017 benchmark problems and WCCI20_MTSO benchmark problems are summarized in Tables 14 and 15.

Table 14 Mean fitness values obtained by SHADE, LSHADE and EMT-ADT on CEC2017 problems
Table 15 Mean fitness values obtained by SHADE, LSHADE and EMT-ADT on WCCI20_MTSO problems

As can be observed in Table 14, EMT-ADT achieves superior performance in terms of solution quality on almost all the CEC2017 benchmarks. More specifically, the proposed EMT-ADT wins 17 out of the 18 competitions, which demonstrates that the implicit knowledge transfer across tasks in multitasking is beneficial for improving the performance of EMT-ADT. From Table 15, it can be seen that EMT-ADT works well on more complex multitasking benchmark suits WCCI20_MTSO, better than the values obtained by SHADE and LSHADE, i.e., SHADE and LSHADE got the best result on 0 and 3 instances.

Statistical analysis

In this section, the multiple-problem Wilcoxon’s test is used to compare the significant difference (p value) between the competitor algorithm and EMT-ADT, while the Friedman test is used to rank the significance of the compared algorithms statistically. Table 16 shows that EMT-ADT provides higher R + values than MFEA, MFEARR, AT-MFEA, MTEA-AD, MFDE, TLTLA, SREMTO and MFMP. The p values of MFEA, MFEARR, AT-MFEA, MTEA-AD, SREMTO and MFMP are less than 0.05, which indicates that the proposed EMT-ADT considerably wins these competitors. For PI + HS, PI + MS and PI + LS problems, the p values of MFDE and TLTLA are greater than 0.05, which indicates that the performance of MFDE and TLTLA is similar to that of EMT-ADT on these problems. For NI + HS, NI + MS and NI + LS problems, the p value of TLTLA is greater than 0.05, which indicates that the performance of TLTLA is similar to that of EMT-ADT on these problems. Furthermore, Table 17 shows the average rank and the overall rank of the compared algorithms for CEC2017 multitask problems. In sum, the average performance of EMT-ADT is first rank on all test problem.

Table 16 Wilcoxon test results for EMT-ADT on CEC2017 multitask problems
Table 17 Average ranking of the algorithms (Friedman)

Figure 11 visually shows the ranking of the EMT-ADT and competitor algorithms for CEC2017 multitask problems. The left side demonstrates the ranking of nine algorithms on CEC2017 multitask problems, while the right side illustrates the bar chart of average ranking of nine algorithms. Since the size of EMT-ADT in radar graph is smaller than that of other compared algorithms, it indicates that the EMT-ADT is competitive. Moreover, EMT-ADT has the shortest bar in the bar chart, which demonstrates that the EMT-ADT is superior to other competitor algorithms.

Fig. 11
figure 11

The radar graph and bar charts of different algorithms on CEC2017 multitask problems

Population size sensitivity analysis

Population size has a considerable bearing on the rate of convergence. Generally, large population size may slow the convergence rate, while small population may promote a faster convergence [34]. In the proposed algorithm, the parameter γ represents the dispersion degree of population convergence, which is used to control the population size. To analyze the effects of the parameter γ on the performance of EMT-ADT, firstly, the degree of dispersion (DOD) of the population is defined as follows.

$$\mathrm{G}=\frac{{\sum }_{i=1}^{N}{\mathbf{x}}_{i}}{N}$$
(14)
$$\mathrm{DOD}=\frac{{\sum }_{i=1}^{N}{\Vert G-{\mathbf{x}}_{i}\Vert }_{2}}{N}$$
(15)

where xi is the ith individual in the population P, and N is the population size.

Figure 12 shows the degree of dispersion obtained by EMT-ADT on CEC2017 multitask problems. As seen from Fig. 12, the dispersion of the population is different during the evolution process for different problems with different tasks. To determine the parameter γ for better performance of EMT-ADT, CI + MS, PI + MS and NI + MS problems are selected to test the performance of EMT-ADT with different values of the parameter γ, i.e. γ = 0.0005, γ = 0.001, γ = 0.005, γ = 0.01, γ = 0.015, on all tasks.

Fig. 12
figure 12

The degree of dispersion obtained by EMT-ADT on CEC2017 multitask problems

Figure 13 and Table 18 respectively show the convergence curves and the ranking results on CI + MS, PI + MS and NI + MS problems with different values of the parameter γ. As can be seen in Fig. 13 and Table 18, the algorithm performs better with γ = 0.001. Therefore, γ is set to 0.001 in the proposed algorithm.

Fig. 13
figure 13

Convergence curves with different γ

Table 18 Ranking of different γ on CI + MS, PI + MS and NI + MS problems

As mentioned before, the population size is adjusted at γ = 0.001, the algorithm performs the best. Since the dispersion of the population is different during the evolution process for different problems with different tasks, the mean dispersion degree of the population on nine CEC2017 multitask problems is calculated, as seen in Fig. 14. The location of red circle is recorded as (FES, γ), which represents the time when the population is adjusted. Therefore, when FES is 50,000, that is FES/MAXFES = cos(γ)/4, the population size should be adjusted to achieve better performance.

Fig. 14
figure 14

Mean dispersion degree obtained by EMT-ADT on CEC2017 multitask problems

Results on real-world problems

In this section, two real-world problems called the traveling salesman problem (TSP) [35] and traveling repairman problem (TRP) [36] are used to verify the practicability of the proposed EMT-ADT algorithm. More specifically, TSP and TRP are constructed as a multitask optimization problem, which is asked for a tour with minimum cost.

Given a list of n cities and a list of maintenance times for n cities, TSP aims to minimize the total time to visit these cities, while TRP aims to minimize the sum of the elapsed times for all customers that have to wait before being served [37]. Each city is allowed to visit only once, and finally returns to the origin city. Given a distance matrix \(\mathrm{C}=\{\mathrm{c}({x}_{i},{x}_{j})|i,j=\mathrm{1,2},\dots ,n\}\), where c(xi, xj) is the distance between two cities xi and xj. Let \(\mathbf{x}=({x}_{1},{x}_{2},\dots ,{x}_{n})\) to be a tour, i.e. a solution of the objective function. P(x1, xn) represents the path from x1 to xn. For the TSP problem, the total time \(ls(p\left({x}_{1},{x}_{n}\right))\) from starting city x1 to xn is computed as

$$ls\left(p\left({x}_{1},{x}_{n}\right)\right)=\sum\limits_{i=1}^{n-1}c\left({x}_{i},{x}_{i+1}\right)+c({x}_{1},{x}_{n})$$
(16)

For the TRP problem, the total time \(lr(p\left({x}_{1},{x}_{n}\right))\) from starting city x1 to xn is computed as

$$lr\left(p\left({x}_{1},{x}_{n}\right)\right)=\sum\limits_{i=1}^{n-1}\left(c\left({x}_{i},{x}_{i+1}\right)+{\sum }_{j=1}^{i}{r}_{j}\right)+c\left({x}_{1},{x}_{n}\right)+{r}_{n}$$
(17)

Generally, a unified representation scheme is used in multitask optimization algorithm, in which every solution is encoded by a random key between 0 and 1. Since the paths in TSP and TRP problems should be a set of integers without duplication, the real number encoding is converted to permutation number encoding. Specifically, all dimension values of the solution x are sorted in ascending order. Afterwards, the permutation solution \({\mathbf{x}}^{{\prime}}\) of solution x is obtained according to the ranking of dimension values. Figure 15 shows a simple example. For a solution x = (0.21, 0.34, 0.84, 0.67, 0.11, 0.08, 0.55), its permutation solution \({\mathbf{x}}^{{\prime}}\)= (3, 4, 7, 6, 2, 1, 5) denotes that the traveler starts from the third city, visits the fourth, seventh, sixth, second, first and fifth cities, and then returns to the third city.

Fig. 15
figure 15

An example of mapping a solution to a permutation of city number

In this experiment, we investigate the EMT-ADT, MFDE, MFPSO and the MFMP. Ten groups of problems are randomly selected from TSPLIB [38]. Suppose travelers travel between cities at a speed of 60 km/h. Each group is composed of a TSP test case and a TRP test case. For each test case, the average time spent by the traveler obtained by each algorithm on 30 independent runs is presented in Table 19. The number following the instance name indicates the number of city nodes. Table 19 shows that EMT-ADT achieves a tour with minimum cost against all peer competitors on almost all the instances. EMT-ADT wins 18 out of the 20 competitions, which demonstrates the superior performance of the proposed EMT-ADT. Figures 16 and 17 show the optimal TSP tour and the optimal TRP tour obtained by different algorithms on the eil51 instance, respectively. Experimental results show that EMT-ADT obtains better solutions than three state-of-the-art multitasking optimization algorithms.

Table 19 Average time (hours) spent by the traveler obtained by each algorithm
Fig. 16
figure 16

The optimal TSP tour on the eil51 instance obtained by different algorithms. a EMT-ADT, b MFDE, c MFPSO, d MFMP

Fig. 17
figure 17

The optimal TRP tour on the eil51 instance obtained by different algorithms. a EMT-ADT, b MFDE, c MFPSO, d MFMP

Conclusions

To enhance the positive knowledge transfer between tasks, this paper proposed an evolutionary multitasking optimization algorithm with adaptive transfer strategy based on the decision tree (EMT-ADT). A method of quantifying the transfer ability of individuals is proposed to select individuals with high transfer ability. A decision tree is constructed to predict the transfer ability of individuals in the archive. Then, individuals with high transfer ability are selected to conduct knowledge transfer, which can improve the performance of the algorithm. The effectiveness of EMT-ADT is verified through a comparison with eight state-of-the-art algorithm, i.e. MFEA, MFEARR, AT-MFEA, MTEA-AD, MFDE, TLTLA, SREMTO and MFMP. The experimental results indicated that EMT-ADT performed better on most CEC2017, WCCI20-MTSO and WCCI20-MaTSO multitask problems.

As mentioned in [39], several specific actions are advocated to bright light to the field of evolutionary multitask optimization. For example, the computational complexity of evolutionary multitasking methods should be regarded as a metric for the evaluation of algorithms. Future work will focus on improving the effectiveness of knowledge transfer operations between tasks with low correlation, and reducing the computational complexity for multifactorial problems with more than two tasks. Moreover, EMT-ADT will be used to solve multi-objective multitask optimization problems.