A multi-objective hyper-heuristic algorithm based on adaptive epsilon-greedy selection

A variety of meta-heuristics have shown promising performance for solving multi-objective optimization problems (MOPs). However, existing meta-heuristics may have the best performance on particular MOPs, but may not perform well on the other MOPs. To improve the cross-domain ability, this paper presents a multi-objective hyper-heuristic algorithm based on adaptive epsilon-greedy selection (HH_EG) for solving MOPs. To select and combine low-level heuristics (LLHs) during the evolutionary procedure, this paper also proposes an adaptive epsilon-greedy selection strategy. The proposed hyper-heuristic can solve problems from varied domains by simply changing LLHs without redesigning the high-level strategy. Meanwhile, HH_EG does not need to tune parameters, and is easy to be integrated with various performance indicators. We test HH_EG on the classical DTLZ test suite, the IMOP test suite, the many-objective MaF test suite, and a test suite of a real-world multi-objective problem. Experimental results show the effectiveness of HH_EG in combining the advantages of each LLH and solving cross-domain problems.


Introduction
Multi-objective optimization problems (MOPs) with m objectives can be described as Min F(x) (f 1 (x), …, f m (x)), x ∈ Ω [29]. The solution x of an MOP is an n-dimensional decision vector x (x 1 , …, x n ), and F(x) is the phenotype of the solution x in the objective space. A solution u (u 1 , …, u n ) Pareto dominates another solution v (v 1 , …, v n ) if and only if the objective value of u is less than or equal to that of v on all objectives, and the objective value of u is less than that of v on at least one objective. Solutions that are not dominated by any solution are called non-dominated solutions. The set of all non-dominated solutions is called the Pareto For NP-hard problems, researchers often design customized heuristics to obtain reasonable solutions within a polynomial time. When solving MOPs, the commonly used heuristics are multi-objective evolution algorithms (MOEAs) [5]. One of the major advantages of MOEAs could be that they are population-based algorithms which can obtain a set of solutions approximating the PF in a single run. However, MOEAs sometimes can obtain high performance for several problems but may be ineffective when solving other problems or problems from similar domains [35]. For instance, some dominance-based MOEAs are only effective for MOPs with less than 4 objectives [33]; the decomposition-based MOEAs using fixed reference vectors are not good at solving MOPs with irregular PFs [13,14].
A hyper-heuristic (HH) is a methodology which has with cross-domain search abilities. This methodology is a highlevel strategy by controlling a set of low-level heuristics (LLHs). Instead of solving the problem directly, at each decision point, an HH tries to select one LLH and apply the selected LLH to solve the problem at hand. If an HH is applied to another problem, the high-level strategy does not need to be redesigned [1].
This study proposes an adaptive epsilon-greedy selection based multi-objective hyper-heuristic algorithm (HH_EG) that applies multiple MOEAs as LLHs. In each iteration, HH_EG uses an online-learning strategy to score each LLH and utilizes an adaptive epsilon-greedy selection strategy to select one LLH based on the scores. The selected LLH is then used to solve the problem in this iteration. Specifically, the contribution of this paper consists of the following aspects: 1. An adaptive epsilon-greedy selection strategy is proposed for the hyper-heuristic algorithm to select LLH when solving MOPs. At early stages, the adaptive epsilon-greedy selection strategy prefers to select LLHs randomly while at later stages, the adaptive strategy tends to select the current best LLH using a greedy strategy. The adaptive method updates the probability of using random or greedy algorithms to balance exploration and exploitation. 2. In HH_EG, no parameter is needed to be tuned. Thus, HH_EG could be combined with various indicators to adapt to the preference of users. 3. HH_EG is tested on multiple test suites. We evaluate HH_EG with NSGA-II [9], MOEA/D [37] and, IBEA [40] as LLHs on classical DTLZ [10] and the real-world vehicle crashworthiness problem (VC) [20] test suites.
We also conducted an experiment on a challenging IMOP [32] test suite by altering LLHs with NSGA-II, MOEA/D and MSEA [34]. Additionally, the MaF [3] test suite with 5 and 10 objectives are also tested with NSGA-III [8], MOEA/D and MSEA as LLHs. Experimental results show the advantages of HH_EG and its cross-domain search abilities.

Multi-objective hyper-heuristics
A hyper-heuristic algorithm is an automated methodology for selecting or generating heuristic algorithms to solve computational search problems [1]. There are two main aspects to be considered in the search process of hyperheuristics: (1) the nature of heuristic search space; (2) the feedback during learning. These factors are orthogonal because heuristic search spaces can be combined with different feedbacks [1]. In terms of the nature of heuristic search space, hyper-heuristics can be divided into selection based hyper-heuristics (methodologies for choosing or selecting existing heuristics) and generation based hyperheuristics (methodologies for generating new heuristics from components of existing heuristics). In terms of the feedback, hyper-heuristics can be divided into online learning hyperheuristics, offline learning hyper-heuristics and non-learning hyper-heuristics. In the online learning hyper-heuristics, the learning process occurs at the same time as the algorithm solves the problem instance. The most common example is reinforcement learning [19,36]. In the offline learning hyperheuristics, the learning process collects knowledge from a set of training examples in the form of a certain rule or procedure, which can be generalized to solve new examples. Typical offline learning strategies are classifier systems [26], case-based reasoning [27], and genetic programming [15]. The non-learning hyper-heuristics, such as random hyperheuristics (choosing LLHs randomly), do not deal with the feedback. In recent years, many studies on hyper-heuristics for solving MOPs have been proposed. Many of these studies focus on online-learning selection hyper-heuristics. In [2], a multi-objective selection hyper-heuristic (TSRoulette Wheel) based on tabu search was proposed. In TSRoulette Wheel, tabu search is used to select LLHs. Experimental results show that TSRoulette Wheel has a good effect on the space allocation and timetabling problems. McClymont and Keedwell [24] proposed an online learning selection hyper-heuristic algorithm (MCHH) based on the Markov chain. Markov chain guides the selection of LLHs, and an online reinforcement learning updates the transfer weights between heuristics. When solving the DTLZ test suite, MCHH performs better than the random hyper-heuristic (HH_RAND) and the TSRoulette Wheel. Subsequently, MCHH was applied to a water distribution networks design problem [25]. In [23], an online learning multi-objective hyper-heuristic algorithm based on choice function (HH_CF) was proposed. HH_CF uses the choice function as the selection method, while employing four indicators and a specific ranking strategy as the learning and feedback strategies. HH_CF uses three well-known MOEAs (NSGA-II, SPEA2 [41], and MOGA [11]) as LLHs. Experiments confirm the ability of HH_CF when solving continuous MOPs. Later, HH_CF is combined with different move acceptance strategies, including the great deluge algorithm and late acceptance in [22].
Li et al. [18] used a multi-objective selection hyperheuristics to solve the multi-objective wind farm layout optimization problem. Later, Li et al. [19] proposed two novel online learning multi-objective hyper-heuristics (HH_LA and HH_RILA) based on learning automata. Learning automata is used as the learning strategy, and -RouletteGreedy is proposed to select LLHs from three MOPs (NSGA-II, SPEA2, and IBEA). Experiments on DTLZ, WFG and VC test suites showed that HH_LA and HH_RILA perform better than HH_CF and HH_RAND. Gonçalves et al. [12] used an innovative Restless Multi-Armed Bandit to expand the MOEA/D framework and proposed a multi-objective selection hyper-heuristic algorithm MOEA/D-RMAB. The RMAB strategy is used to select DE operator which should be used by each individual during the operation of the MOEA/D framework to model and process the dynamic behavior of various operators. Yao et al. [36] proposed multi-objective hyper-heuristic algorithms RL-MOHH and RL-PMOHH based on reinforcement learning. By learning and selecting a set of individually designed LLHs, the algorithm is tested on the safety-index map constructed from the historical data of New York City. The results show that the proposed hyper-heuristics have faster convergence speed on the multi-objective route planning in a smart city. Zhang et al. [38] presented a perturbation adaptive pursuit strategy based hyper-heuristic (PAPHH) to manage a set of diversity mechanisms or a set of MOEAs. This method uses a perturbation adaptive pursuit strategy to learn the performance of each LLH, and then use the roulette wheel method to select an LLH based on the selection probability vector. Experimental results show that PAPHH has highly competitive performance in terms of Spacing metric and Hypervolume metric.

Epsilon-greedy
The selection method is important for multi-objective selection hyper-heuristics when taking an action (i.e., selecting an LLH). Several classical selection methods, such as the random algorithm or the greedy algorithm, have been used in the literature. Those methods are different from the function of selection probabilities. A random algorithm takes an action with the same probabilities. This method is easy and has no extra parameter to be considered. The greedy algorithm only selects the best action which is with the highest probabilities. The greedy method has a strong exploitation ability, but always neglects 'poor' actions which may perform good in the future. The greedy method easily falls into the local optima.
The Epsilon-greedy method combines the random algorithm and the greedy algorithm to deal with the exploration and exploitation dilemma [16,28]. The main idea of the Epsilon-greedy is to control the utilization rate of the greedy algorithm or the random algorithm through a small probability (smaller than 1) with the aim to make the behavior of the Epsilon-greedy to be greedy most of the time, but random once in a while. Specifically, when selecting an action at one decision point, the Epsilon-greedy method first generates a random value rnd in the interval [0, 1], if rnd is larger than , then the Epsilon-greedy uses the greedy algorithm to select an action, otherwise, the random algorithm is used to select an action.

Framework of HH_EG
HH_EG is a multi-objective selection hyper-heuristic withiniter_total iterations. During each iteration (also called a decision point), an LLH is selected to solve the problem at hand. As shown in Algorithm 1, given a set of LLHs, denoted as H {h 1 , h 2 , …, h l }, HH_EG firstly gets an initial population Pop of size N and initializes the score vector S (s 1 , s 2 , …, s l ), where s i is the score of h i , i ∈ {1, …, l} (step 1). At each iteration (steps 2-9), HH_EG selects an LLH h cur according to the score vector (step 3) and then uses h cur to evolve the population Pop to generate a new population Pop (step 4). The score vector S is updated according to the degree of improvement after applying h cur (step 5). If the population Pop is better than Pop, then Pop is updated (steps 6-8). If the termination criterion is not met, the algorithm goes to step 2.

Initialization strategy
The initialization strategy is used to evaluate the initial performance of each LLH. Beginning with generating a random population, each LLH h i evolves the random population separately with the same number of generations, and obtains a resultant population Pop i . The indicator value m popi evaluated by indicator M for the resultant population Pop i obtained by h i is then translated into an initial score s i based on the normalized function, as shown in formula (1).
where m min and m max are, respectively, the minimum and maximum indicator values found so far by all LLHs in the initialization strategy. HH_EG can employ various indicators in the framework. If the indicator M is a maximization (or minimization) problem, the larger (or the smaller) score value equals to the better performance. Eventually, the resultant population from the best scoring LLH is then used as the initial population Pop of HH_EG.

Adaptive Epsilon-greedy selection strategy
An adaptive epsilon-greedy selection method is designed as a selection strategy to improve the decision-making ability of HH_EG. The main idea is that the adaptive epsilon-greedy selection strategy first focuses on exploring using the random algorithm to select an LLH. Then, the selection method begins to be greedier using the greedy algorithm. That is, we randomly generate a value in [0, 1]. If the random value is smaller than , a random LLH from the LLH set is selected to be applied in the next iteration; otherwise, the best scored LLH selected by the greedy algorithm will be applied.
In the traditional Epsilon-greedy algorithm, is a fixed parameter which is needed to be tuned manually. This paper uses a formula (2) to update the -value according to the number of iterations: The -value presents a linear decreasing trend. Its value is large at former search processes, so HH_EG has a great probability to randomly select an LLH. At later stages, -value becomes smaller (down to 0), which means that HH_EG is to select the currently best-scored LLH. This strategy of selection makes the algorithm focusing on exploring different LLHs at early stages and then turns to greedy exploitation gradually as the search progresses.

Score update strategy
As discussed in [17,38], the LLHs luckily to be chosen at the early stages are easy to get larger improvement values. However, the LLH used at later stages always only brings small improvements even if it is more effective than the rest of LLHs. So the raw indicator value is not able to reflect the actual performance of each LLH throughout the entire search process [38].
In the proposed algorithm, we use the indicator improvement rate with the indicator value of the output population as the baseline to update the score vector S based on the feedback of the selected LLH h cur during searching. If h cur obtains a positive feedback after being applied at one iteration, then its score is rewarded. Otherwise, its score is punished.
Specifically, after applying h cur at one iteration, the input population Pop and the output population Pop are evaluated using the indicator M, and then the score value s cur of h cur is updated according to Eq. (3): where m in and m out denote the indicator values of the input population and the output population. For a maximization indicator M, if the increment (we mean m out − m in ) is greater than 0, we denote it that an improvement (positive feedback) has obtained after applying h cur . The score s cur is rewarded. Otherwise, it means a deteriorated operation (negative feedback) is executed by applying h cur . The score s cur is punished. For a minimization indicator M, the situation is reversed.

Move acceptance strategy
In this research, the only-improving acceptance strategy [22] is used to determine whether to accept the output population or not when solving MOPs

Test suites
DTLZ is a classical test suite which is often used as a baseline to test the performance of multi-objective algorithms. DTLZ1 contains linear and multimodal characteristics. The PFs of DTLZ2-6 are concave. Besides, DTLZ3 is a multimodal problem; DTLZ4 is biased; DTLZ5 is degenerate; and DTLZ6 is a biased and degenerate problem. DTLZ7 has mixed, discontinuous and biased characteristics. These characteristics enable DTLZ to comprehensively test the performance of various algorithms. Experiments are conducted on three-objective DTLZ1-7 test problems. IMOP is a novel test suite with extremely irregular PFs proposed recently. IMOP1-3 are two-objective MOPs. IMOP1 and IMOP2 have convex and concave PFs with sharptails. The PF of IMOP3 is a multi-segment discontinuous curve. IMOP4-8 are three-objective MOPs with varied shaped PFs. For IMOP4, its PF is a wavy line. The PF of IMOP5 is composed of 8 independent circular faces. The PF of IMOP6 is divided into several grids. The PF of IMOP7 is part of 1/8 sphere, while the PF of IMOP8 looks like a continuous plane, but actually consists of 100 discontinuous small pieces.
MaF is a recently proposed test suite for many-objective optimization. We choose MaF 1-7 with 5 and 10 objectives in the experiments. The seven MaF problems are with diverse properties, such as being multimodal, disconnected, degenerate, and biased. The aim of using MaF is to verify the effectiveness of HH_EG on many-objective optimization problems with various real-world scenarios.
VC is a real-world multi-objective optimization problem referring to the ability of vehicles and components to protect passengers during a collision in the automotive industry. The VC test suite involves the optimization of weight (Mass), acceleration characteristics (Ain), and toe-board intrusion (Intrusion  [20].

Performance indicators
In the experiments, IGD [4] and HV [42] are employed in the proposed HH_EG framework as performance indicators to evaluate each LLH during iterations of HH_EG. The main reasons for choosing IGD and HV are that they are commonly used and can evaluate multiple aspects, such as diversity and convergence, of the population. The IGD indicator evaluates the averaged minimum distance between each reference point on PF and the obtained population got by an algorithm. The HV indicator evaluates the objective space covered by a given population. For the two indicators, IGD requires prior knowledge of the PF, while HV does not. But HV is with a slower resolving speed, especially for high-dimensional MOPs. Since DTLZ, IMOP and MaF test suites have known PF, HH_EG incorporates IGD as a performance indicator to serve the high-level online learning strategy as the feedback when solving the two test suites. Aimed at the problems of VC test suite have unknown PFs, HH_EG uses HV instead of IGD as the performance indicator for solving VC test problems.

Low-level heuristics
Five MOEAs are used to form the low-level heuristic set, denoted as h 1 (NSGA-II [9]), h 2 (MOEA/D [37]), h 3 (IBEA [40]), h 4 (MSEA [34]), and h 5 (NSGA-III [8]). Generally, any population-based MOEAs can be incorporated into HH_EG as LLHs. The main reasons for choosing NSGA-II, MOEA/D, and IBEA are that they are classical and always be the baseline for comparison [31,39]. MSEA is a recently proposed MOEA for getting better distributed population and NSGA-III is designed for many-objective optimization problems (having four or more objectives). Besides, the five MOEAs are from different categories of MOEAs. Specifically, NSGA-II and IBEA are the dominance-based MOEA and the indicator-based MOEA, respectively, while both MOEA/D and NSGA-III are the decomposition-based MOEA [21]. MSEA is the multi-stage MOEA [34]. Each of the five MOEAs is to promote the approximation population along with the PF, but they are different in utilizing different algorithm components during searching.
h 1 (NSGA-II): Each solution in the population is sorted according to its Pareto-optimal level. Only solutions in low levels are selected into the next generation. Besides, the crowding distance assignment strategy is used to maintain the diversity of the population during the selection process. h 2 (MOEA/D): MOEA/D uses a set of uniformly distributed weight vectors to decompose the MOPs into multiple single-objective sub-problems. Each sub-problem is optimized by utilizing the information of neighborhood subproblems, and the given weight vectors are used to control the diversity. h 3 (IBEA): IBEA uses binary indicators to redefine the optimization goal. The indicator values are used as fitness in the environment selection, and no additional diversity maintaining strategy is needed to control diversity. h 4 (MSEA): MSEA is a multi-stage MOEA for getting better-distributed population. The optimization process is divided into multiple stages according to the population in each generation, and the population is updated by different steady-state selection schemes in different stages. MSEA can solve MOPs with highly irregular PF.
h 5 (NSGA-III): NSGA-III is a reference-point-based evolutionary algorithm based on the NSGA-II framework with the aim to solve many-objective optimization problems. Solutions in the population are ranked by the non-dominated sorting method and the reference points are adopted to control the diversity.

Parameter settings
All LLHs use the simulated binary crossover operator and the polynomial distribution operator [7] for recombination and mutation respectively. The crossover probability is set to

Meta_algorithm --
m is the number of objectives. N is the population size. G max stands for the max generations. SEs is the solution evaluations. iter_total and generation are used in hyper-heuristics, standing for the number of total iterations and the number of generations of the selected LLH running in one iteration, respectively. Parameters K, g and α are the threshold used in PAPHH, the max number of generations in each iteration used in HH_RILA and the intensification parameter used in HH_CF, respectively 1 and the mutation probability is set to 1/n (n is the number of decision variables of a given problem). The crossover and mutation distribution index are both set to 20. MOEA/D uses PBI [37] as the decomposition method. The neighborhood size of MOEA/D is set to 0.1 × N (N indicates the population size), and the maximum number of solutions replaced by each offspring is set to 0.01 × N. The uniformly distributed weight vectors are generated using Das and Dennis' sampling method [6]. The archive size of IBEA is the same as the population size N, and the fitness scaling factor is set to 0.05. Following the recommendation in [8,19,23], the parameter settings of each algorithm on DTLZ, IMOP, MaF, and VC test suites are given in Table 1. Other parameters are the same as used in their original papers. The parameter k in DTLZ1-7 is set to {5, 10, 10, 10, 10, 10, 20} [10]. The parameters of IMOP1-IMOP8 are set to {I 5, L 5, a 1 0.05, a 2 0.05, a 3 10} [32], where I and L control the length of decision variables, while a 1 -a 3 control the difficulty of diversity protection.
The indicator IGD is used for DTLZ, IMOP, and MaF test suites. We sample 10,000 uniformly distributed solutions in the objective space from the PF of each DTLZ, IMOP and MaF test problems. The value of IGD ranges in [0, ∞), the smaller value indicating a better performance. HV is used for the VC test suite. Before calculation, the population is normalized. In the measurement of HV, the reference points are set as r i z nadir i +0.5(z nadir i −z ideal i ) for the ith objective [19], where i ∈ 1 …m is the index of the objective and m is the number of objectives, z nadir i and z ideal i are the maximum and minimum values of the ith objective in the population to be calculated respectively. After normalization with the reference point, the value of HV ranges in [0, 1]. The larger HV value indicates a better performance.
All algorithms are run for 30 independent trials and compared using one-tailed Wilcoxon rank-sum test method. '+', '−', and '=' represent the comparing algorithm is "superior to", "inferior to", and "similar to" HH_EG at 0.05 significance level. All source codes are implemented with a popular public-domain software named PlatEMO [30] performed on an Intel core i7 3.1 GHz/8G/1T computer.

Experimental results
Experiments in this section are divided into four parts. In "Experimental results on DTLZ", we evaluate the effectiveness of HH_EG by controlling three LLHs, H  The mean, standard deviation and ranking performance of HH_EG and the comparing algorithms with respect to IGD or HV indicators got from 30 independent trials on DTLZ, IMOP, MaF, and VC test suites are shown in Tables 2, 3, 4 and 5, respectively. The best mean IGD or HV value on each problem is highlighted in bold. In each table, the ranking values across all test problems are summarized for every algorithm (denoted as "Total"). After the summarization, the final ranking order is displayed sorted by the "Total" value (denoted as "Final Rank"). The corresponding box plots are illustrated in Figs

Experimental results on DTLZ
From Table 2, we can see from the statistical results of the IGD values on DTLZ test suite from 30 trials that HH_EG works the best on DTLZ2 and DTLZ4-7. HH_EG is significantly better than NSGA-II, IBEA, HH_RAND, HH_RILA, and PAPHH on all DTLZ problems, while HH_EG is significantly better than MOEA/D and HH_CF on five DTLZ problems, including DTLZ2 and DTLZ4-7. For the remaining two problems (DTLZ1, DTLZ3), HH_EG and HH_CF preform similarly. MOEA/D works the best on DTLZ1 and DTLZ3, followed by HH_EG. The superior performance of MOEA/D on DTLZ1 and DTLZ3 is attributed to its uniformly distributed weight vectors which make MOEA/D good at dealing with problems with regular PF. However, HH_EG works better than MOEA/D both on irregularshaped problems (DTLZ5-7) and the rest regular-shaped problems (DTLZ2 and DTLZ4). Furthermore, HH_EG is relatively stable compared to comparing hyper-heuristics (HH_RAND, HH_CF, HH_RILA, and PAPHH), except DTLZ5 for HH_CF. Corresponding box plots are shown in Fig. 1. According to Total and Final Rank, HH_EG performs the best across all DTLZ test problems in the overall. Table 3 shows that HH_EG is superior to all comparing algorithms on all IMOP test problems with respect to the IGD values, except that HH_EG is inferior to MSEA on IMOP3. From the statistical results of the Wilcoxon ranksum test, HH_EG is significantly better than NSGA-II, MOEA/D, HH_RAND, and PAPHH on all IMOP test problems. HH_EG is significantly better than MSEA on five instances (IMOP1 and IMOP5-8), and performs similarly to MSEA on IMOP2 and IMOP4. With similar performance on IMOP4-5 for HH_CF and IMOP5 for HH_RILA, HH_EG performs significantly better than HH_CF and HH_RILA on the rest IMOP test problems. Among hyper-heuristics, HH_EG is more stable than HH_RAND and HH_RILA on all test problems. The stability of HH_EG is superior to that of HH_CF and PAPHH except for two test problems (IMOP2 and IMOP4 for HH_CF, IMOP6 and IMOP7 for PAPHH). Figure 2 shows corresponding box plots. The final rank in Table 3 assesses the generality level of each algorithm across IMOP1-8. The rank results show that HH_EG delivers the best performance.

Experimental results on MaF
Experimental results in Table 4 show that HH_EG is the top algorithm on 10 out of 14 MaF test problems except for 5-objective MaF4 and MaF6 test problems, 10-objective MaF4 and MaF7 test problems. On the four MaF test problems, MSEA performs the best. From the statistical results of the Wilcoxon rank-sum test, HH_EG is significantly better than MOEA/D, HH_RAND, and HH_CF on all MaF test problems. HH_EG is significantly better than NSGA-III, HH_RILA, and PAPHH on 13, 12, and 11 MaF test problems respectively, and performs similarly to NSGA-III, HH_RILA, and PAPHH on the rest test problems (10-objective MaF7 for NSGA-III, 10-objective MaF5 and 5-objective MaF7 for HH_RILA, 10-objective MaF1, MaF5, and MaF7 for PAPHH). HH_EG performs significantly better than MSEA on 10 MaF test problems, while MSEA performs significantly better than HH_EG on 3 MaF test problems. From Table 4, HH_EG is more stable among hyper-heuristics, except 5-objective MaF4 for PAPHH and 5-objective MaF5 for HH_RILA. HH_EG ranks first compared with all comparing algorithms according to the final rank, which confirms that HH_EG is effective when solving many-objective MaF test problems. Corresponding box plots of MaF test suite are shown in Figs. 3 Table 5 shows the statistical performance comparison of HH_EG and comparing algorithms from 30 independent runs with respect to the HV values for solving VC test problems. For the HV indicator, a higher value indicates better performance, and the box plots of each algorithm are visualized in Fig. 5. From Table 5, we can see that, among the three LLHs, IBEA and NSGA-II perform the best on VC1 and VC2-4, respectively. MOEA/D performs the worst. Besides, HH_EG is significantly better than NSGA-II, MOEA/D, and  The best results are highlighted in bold   The best results are highlighted in bold    The best results are highlighted in bold  The best results are highlighted in bold

Analysis of hyper-heuristics
The average utilization rate of LLHs Figures 6,7,8,9 and

The number of invocations of each LLH versus the average indicator values
To further understand the search progress induced by each LLH, Fig. 11 provides plots based on the number of invocations of each LLH at each iteration obtained from HH_EG over 30 trials on DTLZ2 and IMOP5 (left). Each point on the left subplot indicates the total number of trials of the corresponding LLH been selected by HH_EG, with its value ranges in [0,30]. Figure 11 also plots the average IGD values versus iterations from HH_EG and three MOEAs running alone over 30 trials (right). On DTLZ2, in the initial stages, the left subplot shows that the number of invocations of NSGA-II, MOEA/D, and IBEA are similar, which is caused by the randomness of the random algorithm in the adaptive epsilon-greedy selection strategy in the beginning exploration phase. Later, HH_EG slowly reduces the usage of NSGA-II and IBEA, while increas- ing the usage of MOEA/D. MOEA/D is even predominately used during the final stages. According to the right subplot, MOEA/D is the best performed (with lowest IGD values) algorithm compared to NSGA-II and IBEA. It is not difficult to understand that as the search progresses, the adaptive epsilon-greedy selection strategy becomes more exploitative and prefers invoking the best performed LLH. On IMOP5, the invocations of NSGA-II, MOEA/D, and MSEA are not dominated with each other, especially in the first ten iterations. After that, the invocations of MSEA is gradually increased with significant superiority to NSGA-II and MOEA/D. But, from the right subplot, it is surprising to find that MSEA is only the second-best LLH when running alone, following the best performing LLH (NSGA-II). Meanwhile, HH_EG obtains the optimal IGD value on IMOP5 (as shown in Table 3 and right subplot of Fig. 11), which also shows that after combining each LLH, HH_EG produces better results than each LLH running alone.

The scores of each LLH versus the IGD values
This section investigates the performance of each LLH in HH_EG from the best trail from 30 trials, as shown in Figs. 12 and 13. The left subplots show the score values of each LLH during the solving of HH_EG. The lower score value indicates a better performance. Correspondingly, the right subplots show the IGD values of HH_EG where each tick in the right plots indicate the IGD value obtained after HH_EG applies the selected LLH during searching. Each problem has its own pattern. For example, for DTLZ2 (see Fig. 12), MOEA/D is invoked and applied at the 2nd, 3rd, and 8th iterations, where the IGD values decrease and the corresponding score values of MOEA/D decrease, too. That is the scores of MOEA/D are rewarded. At the 5th, 7th, and 9th iterations, NSGA-II is invoked, while at the 1st, 4th, and 6th iterations, IBEA is invoked. The IGD values at the six iterations are unchanged, but their corresponding score values increase, indicating that un-improved populations are generated. The scores of NSGA-II and IBEA are punished. As we can see, MOEA/D has the lowest scores (performing the best). As the iteration progresses, the adaptive epsilon-greedy selection strategy becomes greedier and MOEA/D has more chance to be selected, which are consistent with the high utilization rate of MOEA/D on DTLZ2 in previous reports in "The average utilization rate of LLHs".
For IMOP5 (see Fig. 13), NSGA-II is selected at the 1st and 2nd iterations, while MSEA is selected at the 3rd, 7th, 8th and 24th iterations. The IGD values decrease at the six iterations and the corresponding score values decrease. For the rest iterations, the IGD values do not change, but the corresponding score values increase, indicating that the new generated population is not improved after applying LLHs at these iterations. From Fig. 13, the score values of NSGA-II decrease bellow MSEA in the beginning and increase above MSEA later. According to the score values, HH_EG considers that NSGA-II is superior at the early stages, while MSEA works better at later stages. For the adaptive epsilongreedy selection strategy bias to select the best performed LLH as the search progresses, the probability of selecting MSEA is generally increasing. This explains why MSEA as the second-best LLH is selected the most on IMOP5 (shown in Figs. 7 and 11).

Conclusion
This paper proposes a novel multi-objective hyper-heuristic, HH_EG. This algorithm combines the advantages of existing MOEAs to produce better results on multiple types of problems. Four strategies make up the main high-level framework of HH_EG, including initialization, adaptive epsilon-greedy selection, score update, and move acceptance. Performance indicators are utilized to provide feedback to the highlevel online learning strategy. In the proposed algorithm, no parameter is needed to be tuned. This suggests that HH_EG requires little expert intervention. This simplification makes it feasible to incorporate variety of indicators into HH_EG. Experimental results show that HH_EG is more competitive than each LLH running alone and is also superior to four classical hyper-heuristics when dealing with MOPs. Results on DTLZ, IMOP, MaF, and VC show the competitiveness and cross-domain search abilities of the proposed hyperheuristic.
In future work, we will apply HH_EG to different types of MOPs, such as many-objective, multi-model, or real-world problems, to assess its effectiveness as well as the crossdomain ability. Another research direction is the impact of hyper-heuristics with different algorithm components as LLHs. Besides the IGD and HV indicators in our work, other indicators could be combined with hyper-heuristics. The impact of indicator values as the feedback in learning strategies is also left as future work.