Adjusting normalization bounds to improve hypervolume based search for expensive multi-objective optimization

When solving expensive multi-objective optimization problems, surrogate models are often used to reduce the number of true evaluations. Based on predictions from the surrogate models, promising candidate solutions, also referred to as infill solutions, can be identified for evaluation to expedite the search towards the optimum. This infill process in turn involves optimization of certain criteria derived from the surrogate models. In this study, predicted hypervolume maximization is considered as the infill criterion for expensive multi/many-objective optimization. In particular, we examine the effect of normalization bounds on the performance of the algorithm building on our previous study on bi-objective optimization. We propose a more scalable approach based on “surrogate corner” search that shows improved performance where some of the conventional techniques face challenges. Numerical experiments on a range of benchmark problems with up to 5 objectives demonstrate the efficacy and reliability of the proposed approach.

challenge in solving expensive optimization problems stems from the fact that a very limited number of candidate designs can be evaluated during the course of search. To achieve competitive results within such a small number of evaluations, researchers have often resorted to the use of surrogate models [4,13]; also referred to as metamodels [11]. Surrogate-assisted optimization (SAO) attempts to build approximations for the response function based on a small number of truly evaluated solutions. The predictions from the surrogate model are then primarily used to guide the search, with only occasional calls to the true evaluations. The newly evaluated solutions in turn help to train more accurate surrogate functions for subsequent search.
The promising solutions selected using the surrogate models for true evaluations are referred to as infill solutions. Typically, an infill solution is obtained by optimizing certain criterion derived from surrogates [20]. In the context of single-objective optimization, a simple and effective scheme is selecting optimal solutions based on the predicted mean values [4,27], the so-called 'believer' model [12]. Another widely used infill criterion is the expected improvement (EI), implemented within the efficient global optimization algorithm (EGO) [20]. The EI criterion takes both the mean prediction and the associated uncertainty of the surro-gate model into consideration. Likewise, in the context of MOPs, some of the reported infill criteria include scalarized objective functions [21], hypervolume (HV) improvement, expected HV improvement, Euclidean distance improvement, maxmin distance improvement, etc. [36]. In this paper, we focus on a steady-state algorithmic framework that uses predicted HV improvement as the infill criterion.
Hypervolume (HV), also referred to as Lebesgue measure, was originally proposed for comparing the performance of multi-objective evolutionary algorithms (MOEAs) [38]. It computes the volume dominated by a given set of nondominated solutions, bounded by a reference point R. It attempts to capture both convergence and diversity of the obtained non-dominated set and is one of the few known unary measures that are Pareto-compliant. This property implies that a solution (set) that dominates another solution (set) will have a better (higher) HV. Thus, the theoretical Pareto front (PF) has the maximum HV in the objective space. Given these interesting and desirable properties, HV has attracted significant attention from both theoretical and benchmarking perspectives [1,29,30]. Furthermore, it has become a prominent approach for environmental selection in indicator-based evolutionary algorithms [2,3,35] for solving MOPs.
There are some important considerations while utilizing HV for online/offline use. One of the concern relates to the user-specified reference point R with respect to which the dominated volume is calculated [17,30]. The common practice in this regard is to position R slightly beyond (dominated by) the nadir point Z N . This allows for the HV contributions of the extremities of the nondominated set to be included, while not being significantly higher than the rest of the non-dominated set. Another critical step in the application of HV is the normalization of the objective values. In MOPs, the range of different objectives can be in different orders of magnitude. Consequently, the HV calculated in the raw objective space can be biased towards the certain objective(s). To manage these two aspects simultaneously, objective vectors are first normalized between 0 and 1 using certain normalization bounds, typically defined by the ideal and nadir points in the current non-dominated set. Then, for an Mobjective problem, the reference point is set to R = 1.1 M = (1.1, 1.1, . . . 1.1), as in, e.g. [35] to compute and identify the next solution(s) that have high HV contribution. Some recent works have also discussed the impact of the the reference point specification [17,30] and provided further insights and recommendations regarding its implications in benchmarking.
In MOEAs, normalization plays an important role [15], especially for estimating and enforcing diversity measures among the solutions. Normalization has been studied in various frameworks such as NSGA-III [5], MOEA/D [14], and DBEA [32]. Normalization evidently helps in balancing the scales of different objectives. However, it also comes with a downside, that the global perspective on where these solutions lie in the raw objective space may become lost during environmental selection. This can affect the performance of the algorithm in terms of convergence and/or diversity of the population, as will be discussed in the next section. Although the effects of normalization have been discussed in the above studies, the focus was not on expensive MOPs. In this paper, we take a step towards addressing the above research gap, in the context of optimization where the sampling is derived from HV-based infill criterion. This research is motivated by our observation that the use of surrogate models in expensive MOPs provides an opportunity to adjust the normalization bounds to achieve better search performance. Thus, we are interested in developing ways that could provide better normalization bounds to aid the search, and in particular overcome the conceptual shortcomings of some of the standard normalization methods currently used.
This study builds upon the ideas we introduced in our previous conference paper [34]. In [34], the discussion and development of the methodology were limited to bi-objective scenarios. The main proposal was to obtain the two extremities of the PF based on the search on surrogate models, and use them to supplement the current non-dominated front to establish the normalization bounds. The underlying principle remains the same in this work, but we make the following extended contributions: • We utilize a more generalized method for finding the extremities ("surrogate corners") to adjust the normalization bounds for M ≥ 2 objectives. The implicit assumptions specific to solving bi-objective problems [34] are thus removed. • Since the number of corners grows with the number of objectives, we introduce additional selection mechanisms to reduce the number of true evaluations directed towards corner search in lieu of infill search. • Lastly, we conduct numerical experiments on problems with up to 5 objectives with a limited budget to demonstrate that the proposed improvements lead to more consistency in achieving competitive results compared to other standard methods.
Following this introduction, the basic idea and algorithmic framework is outlined in Sect. 2, followed by numerical experiments and discussion in Sect. 3. Concluding remarks and future work are discussed in Sect. 4.

General idea
As mentioned previously, a common approach to identify normalization bounds is based on the limits of the current non-dominated (ND) set of solutions. The technique is simple, intuitive and often effective. However, a potential drawback of this procedure is that if the current ND front is biased towards a certain part of the PF, the resulting infill search then also tends to be limited in exploring that part of the PF. This could potentially lead to delay in exploring other regions, or in the worst case, yield only a small subset of the PF as the final output. This situation is visualized in Fig. 1a. In this snapshot of the search, the ND front is concentrated towards one end of the PF. If normalization is conducted using the bounds of this ND front, along with the standard reference point specification (1.1 M ), the infill search is bounded by the region enclosed by the black dotted lines. Since the resulting infill point (in the predicted space) will end up only from this region, the HV contribution of any point outside these bounds with respect to R is zero. Consequently, the improvement in the quality of the current solution set is likely to be small.
An alternative explanation of the shortcoming is demonstrated in Fig. 1a is that the current estimates of ideal and nadir points(Z I and Z N , respectively) are not accurate relative to the true ideal and nadir points (Z * I , Z * N ). The main idea behind the proposed method is therefore to improve the estimation of these points to come up with adjusted normalization bounds. We propose to do so by searching for the extremities of the PF based on the surrogate models. The search on the surrogate models itself does not add to the cost of true evaluations. Only after the surrogate models have identified the extreme points, such solutions would need to undergo true evaluation before being added to the current ND set to identify the new normalization bounds. This process will consume some (small number of) true evaluations and hence does not come at a zero cost. Hence, for the strategy to be useful, the cost directed towards evaluating the extreme points should be (a) reduced to the extent possible, and (b) should eventuate in bringing relative benefits to the search.
Since the number of such additional evaluations due to corner point evaluations is not disproportionately high in the above paradigm, we hypothesize that surrogate-assisted optimization can particularly benefit from such normalization adjustment. This is in stark contrast for attempting the same process in a non-surrogate optimization frameworks, where obtaining the extreme points themselves could take up a substantial portion of the computational budget [32]. The above discussion is visualized in Fig. 1b. Starting from the current ND front, we search for an extreme point along each objective using its surrogate model. Subject to the models being good representations of the true functions, the extreme points on PF are identified, and estimation of nadir and ideal points is improved. This process yields updated normalization bounds (Z I (updated) and Z N (updated)) and the updated reference point. This process expands the search space of infill search, and has the potential of uncovering more diverse solutions on the PF. Of course, it is anticipated that the exact extremities may not be identified in one shot; instead, the search may need to be invoked more than once during the overall optimization process to progressively approach the true values. Additionally, the surrogates are not be likely to be accurate from the beginning for generic non-linear functions; but rather start from a coarse approximation that becomes better over time as more solutions are evaluated. However, even with relatively coarse models, the surrogates may be able to capture general global trend, leading the search to global optimal regions. Where this doesn't happen, the evaluation of the identified corner point instead brings benefit in improving the surrogate models for subsequent iterations.
Also worth noting is another type of normalization scheme that is based on the whole archive of the solutions evaluated so far [23,36]. The strategy may be of interest when the overall numbers of solutions evaluated are small, which is a scenario consistent with expensive MOPs. The archivebased normalization bounds are less likely to be localized compared to the ND-based bounds. However, it may suffer from a different issue. When the evaluated solutions are located far away from the true PF, the solutions that may not be near the PF start to contribute significantly to the HV. Consequently, the HV maximization for infill search does not translate to the selection pressure required to emphasize the diversity on the true PF itself. This situation is illustrated in Fig. 1c, where the infill search ends up with significantly expanded search area beyond the PF. As a result, solutions that lie beyond the true PF boundaries could be suggested as infill, adversely affecting the convergence (and diversity) of final PF approximation.
The proposed approach intends to overcome the abovementioned limitations to yield a more consistent performance across a range of problems. In our previous attempt [34], we applied the extreme point search on each objective function independently to adjust the bounds, which works well for bi-objective problems due to a deterministic number (=2) of extremities. Beyond two objectives, the definition of extreme points becomes somewhat subjective, hence implementation of the method needs some further considerations. To make the approach more generic, we present here a modified version of the Pareto corner search [31] and introduce further improvements to obtain the adjusted bounds. In the following sections, we introduce the surrogate corner search that is used for normalization adjustment.

Corner points in multi-objective problems
A unique feature for bi-objective problems is that for a given non-dominated front (including the PF), the extremities (also referred to as corners [31]) have a clear and intuitive geometric interpretation. One extremity has the lowest value of f 1 and the highest value of f 2 within the set, and the reverse is true for the other extremity. This situation is shown in Fig. 2a. Thus, the two extremities can in principle be identified by minimizing one objective at a time independently. The only caveat is that if an objective function has multiple global optima, then the second objective needs to be subsequently considered to pick the correct extremity.
For MOPs with M ≥ 3, the above does not hold. Minimizing the objectives independently may not lead to unique points on PF that can be considered as corners. Moreover, minimizing a given objective may not result in an extreme value for any of the remaining objectives. Consider, for example, the 3-objective PF shown in Fig. 2b, where minimizing f 3 alone can lead to any of the solutions in the PF that lie on the f 1 − f 2 plane. Out of these, only the two extremes can be intuitively perceived as geometric corners. Figure 2c on the other hand shows a case where there is a unique point corresponding to minimum f 3 (and similarly, other objectives), which can be considered as a corner.
A formal, though restrictive, definition of corner point was provided in [31] in the context of dimensionality reduction. It suggested that if the minimization of a subset of k < M objectives led to a single point, then the point can be considered as a corner. Thus a corner solution can minimize anywhere from 1 to M − 1 objectives, leading to up to total 2 M − 1 possible combinations. Evidently, as the number of objectives increase, searching for all combinations becomes tedious and impractical. Mostly, the common MOPs have much fewer corners, for e.g., there are M corners for most non-degenerate M-objective problems in DTLZ and WFG benchmarks [8,16]. Considering this, two specific subsets of corners were deemed to be of interest in [31]: Type-1 corners, that only minimize one objective value (Fig. 2c), and Type-2 corners, that minimize M − 1 objectives simultaneously ( Fig. 2b).
It is additionally worth noting that theoretically any simplex with an arbitrary number of vertices (e.g., a star/square) can be projected on these PF surfaces, comprising potential PF shapes that could have any number of perceived "corners". Such cases are not covered (and it is subjective whether they should be considered as corners) in the definitions discussed above. Nonetheless, given that in the surrogate corner search we are primarily concerned with objective value bounds rather than internal geometric features of the PF, they still seem sufficient for the task at hand.

Corner search scheme
From the above discussions, it is clear that as the number of objectives grow, independent search along the objectives may lead to increasing possibility of identifying solutions that are far from the intended corners. This will in turn lead to normalization bounds that are not suitable. Hence, the search for the corners need to be conducted concurrently so that the resulting solutions are likely to be closer to the true PF extremities. Therefore, in this paper, we adapt the corner sort ranking from [31] to simultaneously search for corners using a single evolutionary algorithm (EA) that can be used to establish the normalization bounds. An illustration is given shortly, but we would point out an important change in the ranking compared to [31]. That is, we do not consider the L2 norm of M − 1 objectives during the sorting process; which was originally intended to search for Type-2 corners. The reason is that our application context is expensive problems where the evaluation budget needs to be spent carefully. Even though corner search has the potential to improve normaliza-tion bounds, too many evaluations spent on it will leave much fewer of them for the infill search, reducing the anticipated benefits of the improved normalization bounds. To strike a balance between the true evaluations spent on corner evaluations and infill evaluations, we exclude the ranking based on Type-2 corners; thus targeting a maximum of M possible corners of Type-1. The details of the adapted corner search is presented below.
The general goal of corner search discussed here is to identify Type-1 corners as defined in Sect. 2.2, i.e., solutions that minimize each objective individually. An intuitive approach could be running a single-objective optimization sequentially for each objective, however the search for one objective's optimum will not have any knowledge from the points explored for the other objective when using such approach. Instead, the corner search aims to achieve this with a single population in a more efficient and scalable manner. The ranking scheme plays an essential role in enforcing the preferences for the corner points in this method. The ranking can be better explained with an illustrative example shown in Fig. 3.
In Fig. 3, the sample population for a 3-objective problem has six solutions with their IDs marked as 1-6. The intent of sorting is that for each objective, a better solution is ranked higher in the final list. Since we have multiple objectives, this is done in a sequential manner while avoiding duplication. The left block shows the 6 solutions and their objective values. First, each objective is sorted in ascending order independently (minimization is assumed). The sorted IDs of solutions are listed for each objective as shown in the middle block. Next, top solution of each objective is picked in turn (e.g. from left to right) to form the final sorted list. Note that each time a solution ID is picked based on a given objective, this ID is deleted from the sorted solutions available for the rest of the objectives. The solutions selected following this sequence are marked with a circle in the middle block. From the sorted list for f 1 , solution #6 is selected and for f 2 , the solution #3 is selected. For objective f 3 , since the solution #6 has already been picked (crossed out), the top among the remaining solutions, #1, is selected. Now in the final sorted list, we have three solutions #6, #3, #1. Next we restart from objective f 1 . Since the solutions #1 and #3 are already picked in the last round (crossed out), the available solutions start from #5, so solution #5 is selected for f 1 . For objective f 2 , the next best solution is #2 and it is still available, so it is selected for objective f 2 . Next, for objective f 3 , since #2 is already picked in the last step, available solutions start from #4, therefore #4 is selected. All solutions have a unique rank now (in the sequence they were picked), hence the sorting procedure terminates here. The final sorted order is shown in the right block in Fig. 3.
With the above as the ranking process, the corner-search EA based on the surrogate models is run with a population size N evolving over G generations (this part incurs no true evaluations). Then, from the final population, the corners need to be selected for true evaluations. The selection strategies are discussed in the next section.

Corner selection for true evaluation
The most straight forward way is to pick up top M solutions directly from the ND front of the final population. This baseline strategy consumes exactly M true evaluations. However, this number can be reduced. In some cases, the corners obtained by optimizing different objectives can return the same (or similar) solutions. For example, in the Fig. 4, red corner points can assist in preserving low f 1 values or low f 2 values. In this case, there is no need to evaluate corner solutions that are extremely close to each other. Instead, the ND front of the final corner search EA population is analyzed using Silhouette analysis [28] to identify the optimal number of clusters in the set. Clustering is conducted using the K-means method [24]. In silhouette analysis, we calculate the silhouette score for each possible cluster number (from 1 to M). Silhouette score calculates the separation distance between the clusters for each cluster number. The highest score determines the optimal cluster number. From this number of clusters, we can decide how many solutions should be extracted for true evaluation. In each cluster, the solution that has the highest rank in the corner sort is selected. Thus, for the case shown in Fig. 4, even though M = 3, only two solu- Fig. 4 Reducing the number of corners using Silhouette analysis tions, one from each of the clusters at the two ends of the PF, will be selected for evaluation.
Another approach to reduce the candidate solutions before evaluation is to observe their location with respect to the current reference point. Recall that the main objective of identifying these solutions is to adjust the range of selection to appropriate bounds. Therefore, if the predicted values of the solution do not affect the current bounds enclosed by the reference point, then their evaluation will not provide the related benefit. This consideration is illustrated in Fig. 5. The surrogate corners may often be located close to and along the edge of the current normalization boundary. We should consider the benefit they bring in terms of HV improvement, especially with a higher number of objectives. As shown in Fig. 5, the surrogate corner under reference bound (black line) will have a much less HV contribution (purple shade) than a normal infill point (green shade). Therefore, such surrogate corner is ignored. However, surrogate corner beyond the reference bound (black line), has better potential in terms of adjusting the normalization bound. We only evaluate such corners. Thus in this case, out of the two potential corners identified via corner search, only the one outside the box enclosed by R is evaluated.

Algorithm implementation
The pseudo-code of the overall framework is presented in Algo. 1. The main algorithm starts with the generation of an initial set of designs using the Latin Hypercube Sampling (LHS), a widely used method for design of experiments. The designs are then truly evaluated and form the initial archive A. Surrogate models are built for each objective function based on the archive of solutions. This archive A of all truly evaluated solutions is updated in each iteration as the new solution(s) get evaluated. Corners are identified by corner search outlined in Algo. 2. Importantly, note that the corner search is not needed in every iteration. It is only invoked after an infill point is found that is better than the current ideal point (Z I ) in any of the objectives, as indicated in Algo. 1, line 5. The obtained corners are then selectively evaluated using the components discussed above (we discuss three variants in numerical experiments). All evaluated solutions are then added to the archive. The maximum and minimum values among the existing non-dominated solutions are used as the normalization bounds. Using reference point R = 1.1 M , an HV maximization is conducted to obtain the infill solution, which is then evaluated and added to the archive. The surrogate models, archive and number of evaluations are updated, and the process repeats until the evaluation budget is exhausted.
In the above process, we have used Kriging [25] as the surrogate model for approximating the objective functions in Lines 3, 12 and 17 of Algo. 1. Kriging is one of the widely used surrogate models for approximating generic non-linear functions. For HV maximization via infill identification in for g = 1 to G I do Apply evolutionary operator to generate child population C I of size N I Apply corner sort on P I ∪ C I Select the top N I solutions in the above sorted list for the next generation P I end for In the last population's ND front, apply Silhouette analysis to determine the number of clusters S. In each cluster select the solution with the best corner sort rank Return these selected corner points E 1 , E 2 . . . E S Line 14 of Algo. 1, we have used differential evolution (DE) [33] consistent with the previous work [34]. For the corner search in Algo 2, we use the same approach as done in PCSEA [31] and use simulated binary crossover (SBX) [9] and polynomial mutation (PM) [10] as the variation operators. For removing the dominance resistance solutions, for each objective, the tolerance = 1e − 5 is used as done in [34]. The code for the proposed method can be obtained from the authors' website http://mdolab.net/research_resources. html

Benchmark problems used
To demonstrate the performance of the proposed approach, numerical experiments are conducted on a range of test problems widely used in the evolutionary multi-objective optimization domain. These include problems from the ZDT [37], DTLZ [8], WFG [16] and MAF [6], inverted DTLZ [19] and minus DTLZ [18] series, to cover a wide range of problem characteristics and PF shapes. The settings used for the experiments are listed in Table 1. In our preceding work [34], we only considered bi-objective problem suits, one of which was the ZDT series that is not scalable beyond 2 objectives. Hence, we have added MAF series to complement DTLZ and WFG for M ≥ 3. For consistency with our previous paper, we keep the 2-objective experiments on the same problem set (ZDT, DTLZ, WFG) and include additional inverted and minus DTLZ series (iDTLZ and mDTLZ for short) problems. For 3-objective and 5-objective problems, we have used DTLZ (standard, inverted and minus series), WFG and MAF series. Since MAF1 is designed from inverted DTLZ1, to eliminate duplication, we exclude inverted DTLZ1 for 3objective and 5-objective and keep MAF1. Some adjustments are made in the ruggedness and bias parameters of the problems so that they are not too difficult for approximation; as the focus of this study is on normalization bounds rather than improving surrogate accuracy. Accordingly, we have set ruggedness parameter= 2π and scaling factor for g(x) = 1 for DTLZ1, DTLZ3 and bias parameter b poly = 0.5 for WFG. For MAF3 and MAF4 problems, the ruggedness parameter is reduced to π from 20π , scaling factor in g is reduced to 1 from 100. Moreover, the type of surrogate (Kriging) is kept the same for all compared strategies, so that the limitations in approximation, where applicable, affect all strategies in a similar manner. There are also three problems (DTLZ7, MAF2, MAF6), which are not extendable to 5-objective and therefore, they have not been considered in the 5-objective experiments.

Algorithm and experimental settings
The Kriging model used in this study is based on the implementation of the DACE toolbox [25] in Python. The regression function is set to be zero-order polynomial, while the correlation model is set to be Gaussian. Kriging surrogate is chosen since its is extensively used in surrogate-assisted optimization studies. The infill strategy is based on the predicted values (" believer" approach) for all the compared strategies. Other infill strategies, such as expected improvement, are excluded in comparisons as they involve additional factors (such as uncertainties) whose influence is not the subject of the current study. HV is used as the metric for performance comparison of the final solutions, in the objective space normalized by the true ideal and nadir points with the reference point as 1.1 M . A total of 29 independent runs are conducted for each problem to observe the statistical behavior of performance. The performance is compared across six different normalization methods. These include the two standard methods (ND and archive) and our method from the preceding work [34]. As for the proposed method, we consider three variants, by gradually activating the selective evaluation and Silhouette analysis to ascertain their contribution in the overall performance. For reference, the six compared methods are listed below. The main comparison is how each of the four extreme point/corner point-based methods compare with the standard approach of using ND-based or full archive-based approach for identifying the normalization bounds.
-N orm R N D , normalization bounds based on current nondominated solutions. -N orm R A , normalization bounds based on full archive.

Results and discussion
The results obtained for 2-, 3-and 5-objective problems are summarized in Tables 2, 3 and 4, respectively. In the following sections, we discuss the obtained results from three different perspectives. These include the proposed method compared with the conventional approaches (archive and ND front based normalization), performance comparison between the surrogate-based corner search and the extreme point search proposed in our previous conference paper, and the internal comparison between the three variants (S1-S3) of the proposed approach.

Comparison with conventional normalization methods (archive-and ND-based)
In our preliminary work [34], we had highlighted three typical problem characteristics that influenced the normalization performance: initialization position (i.e., whether the initial solutions are close to or far from the PF), coverage of the PF (i.e., whether the initial solutions already provide a good estimate of ideal by encompassing the PF range), and optimization behavior on the surrogate model (i.e., whether the search on the surrogate models led to a significantly better estimate of the extreme points). Both the conventional approaches, i.e., archive-based and ND-based normalization, face challenges on problems with certain characteristics.
Below, we look into whether the proposed surrogate corner search can provide a more consistent and competitive performance across these problems across varying number of objectives.
• Comparison with archive-based normalization: -For the archive-based normalization, it typically faces challenges when initialized solutions span a much larger range of objective values compared to the PF. As discussed earlier, the implication of using a reference point far away is that the solutions that are not on the PF may have a high contribution, which takes away the selection pressure to drive the solutions to a well converged and diverse PF approximation. On the other hand, if the initialized solutions are close to the PF and span its range well, it has the advantage of identifying good nadir and ideal approximation without additional effort. We observed that the typical scenario reflecting the first situation is common in DTLZ (except DTLZ7) problems and MAF problems. The representative cases of the latter scenario are common in the WFG, ZDT and DTLZ7 problems. From Tables 2, 3 and 4, we observe that the proposed N orm R N DC (S3) either outperforms or achieves equivalent results compared to the archivebased method (N orm R A ) in nearly all cases. This verifies that for the cases where the initial population was far away, the proposed approach was able to provide tighter bounds for the HV infill search by bringing the reference point much closer to the true Z I and Z N . -To verify the above case where N orm R A faced challenges, we observe the search behavior on DTLZ1 closely in Fig. 6. Figure 6a shows the initial solutions for the 2-objective DTLZ1, its normalization bounds (orange box), and the position of the reference point. Evidently, due to the distribution of initial solutions, reference point is placed far away from its PF. In contrast, Fig. 6b shows the normalization bounds and location of the reference point suggested by the proposed strategy (N orm R N DC ) using the same initial solutions. It is clear that the performance of the latter is likely to be better than N orm R A strategy. It is expected that the convergence of N orm R A will be inferior in this case, which indeed is the case as can be seen from Fig. 6c. -The two scenarios discussed above are further visualized using 3-objective problems in Figs. 7 and 8. In Fig. 7 for DTLZ7 problem, initial population covers the PF range well, and the solutions are not too far away from the PF. Consequently, the archivebased normalization has the advantage on covering

Summary
Compared to N orm R A ↑ 1 10 ↓ 1 3 ↑ 1 9 ↓ 1 5 ↑ 1 9 ↓ 1 2 ↑ 1 10 ↓ 1 1 Compared to N orm R N D ↑ 2 7 ↓ 2 6 ↑ 2 6 ↓ 2 10 ↑ 2 5 ↓ 2 5 ↑ 2 7 ↓ 2 5 Wilcoxon rank-sum test is used to test significance with 0.05 significance level. The footnote 1 or 2 refers to the comparison is conducted with either the archive-based normalization method or the ND-based normalization method, respectively all the four patches of PF, since it can estimate the bounds well from the beginning. The proposed method does not have the scope of improving this performance significantly via corner search, but we can see that it achieves similar PF convergence and diversity (Fig. 7d-f). Notably, the ND-based normalization (N orm R N D ) performs much worse here, which will be discussed shortly. The second scenario occurs for problem MAF6 in Fig. 8, wherein it is observed that the initial solutions are far from PF. The performance of the archive-based normalization method suffers in this case (for similar reasons as in Fig. 6), while the proposed method maintains its competitive performance.
• Comparison with ND-based normalization: -Though ND-based normalization generally performs well, it faces challenges in the scenario where the initial non-dominated population is concentrated in localized region rather than spanning a good range relative to the PF. In such cases, N orm R N D is not able to easily expand the range to obtain other parts of the PF and the diversity is severely compromised. This is illustrated for DTLZ7 in Fig. 7b, where the final solutions cover only a part of the PF. This is because during the initialization, although the solutions in the archive spans a wide range, the nondominated solutions were concentrated only around two of the patches. The surrogate corner search in Fig. 7d is able to overcome this situation by supplementing the ND set with other corners spanning a larger range which in turn improved its performance significantly. -The performance of the proposed method is also very consistent compared to the ND-based normalization across different number of objectives as Wilcoxon rank-sum test is used to test significance with 0.05 significance level. The footnote 1 or 2 refers to the comparison is conducted with either the archive-based normalization method or the ND-based normalization method, respectively evidenced from Tables 2 and 3. In Table 2 for 2objectives, the last column has 7 problems where N orm R N DC (S3) outperforms N orm R N D significantly. There are 5 problems where the proposed method performs worse. However, it is worth noting that although there are cases where the proposed method performs worse than the ND-based normalization, the gap in performance is marginal. For example, in the DTLZ1 problem in Table 2, the median HV of the ND-based method is 0.7008, while the median HV of the proposed method in the last column is 0.7004. In contrast, for the cases where N orm R N D faces challenge, such as DTLZ7 in Table 2, the gap is significantly larger. -In addition, one can observe from Tables 3 and 4 that N orm R N DC (S3) outperforms N orm R N D in only a few problems, while being equivalent in others. How-ever, this does not indicate that the performance of the proposed approach gets worse with increasing number of objectives. In fact, the main takeaway is that for N orm R A and N orm R N D method, there are both typical scenarios where they suffer disadvantages. The proposed strategy is expected to mitigate these disadvantages in both scenarios and generate rather stable performance across a range of problems, rather than always outperforming the other two strategies. In problems with a higher number of objectives, the performance is equivalent in the majority of instances.
Overall, it can be clearly seen that for the scenarios that are challenging for N orm R N D , the proposed approach is able to adapt and maintain its competitive performance. -Among the problems with a higher number of objectives, there are fewer that exhibit biased initial ND Wilcoxon rank sum test is used to test significance with 0.05 significance level. The footnote 1 or 2 refers to the comparison is conducted with either the archive-based normalization method or the ND-based normalization method front; unlike the ZDT problems in the 2-objective benchmarks (Table 2). Therefore, the performance of N orm R N D and N orm R N DC (S3) tend to be similar. For the 5-objective test problems, N orm R N DC (S3) performs better than N orm R N D for more (4) problems, and worse for fewer (2) problems. For 3objective cases, it has the same number of better and worse cases (3 each). From the perspective of the experiment setting, 5-objective problems have 200 solutions to train its surrogates while there were 150 solutions for 3-objective problems. Since the number of variables was 6 in both cases, we expect more accurate surrogates to be generated for the 5-objective case. The better surrogate accuracy is likely to lead to offer benefit to surrogate-assisted corner search. However, a crucial thing to note here is that the above numbers are relatively small in comparison to the size of the problem set. For a majority of problems in both 3-objective (19 out of 25) and 5-objective (16 out of 22) sets, the strategies actually perform similar. This is consistent with the principles of experiment design. The main intent of the proposed strategy is to provide competent (best or close to best) performance when either N orm R N D or N orm R A face challenging scenarios such as those discussed above.

Proposed approach compared to previous extreme point search method
The proposed method in this paper is a more generalized version of the extreme point-based normalization (N orm N DE ) proposed in our previous work [34]. Thus, in this section we compare the performance between the two. In general, the proposed method N orm R N DC (S3) is more consistent across different problem settings than N orm R N DE . This can be observed from the number of cases where the two normalization adjustment methods perform worse than the ND-based normalization method N orm R N D . The reason to choose comparison with N orm R N D is that for a higher number of objectives, there are fewer test problems that pose challenges to N orm R N D . As a baseline, N orm R N D performs better than the archive-based normalization method. Therefore, using comparison with the ND-based normalization method can allow us to better illustrate the difference in performance. In Table 2, the two normalization adjustment methods perform the same, they both have 6 (N orm R N DE ) and 5 problems (N orm R N DC (S3)) performing worse than the ND-based normalization method and in 7 problems they perform better. In Table 3, last row, it can be seen that for 3-objective problems, N orm R N DE performs worse in 7 problems when compared with N orm R N D , while the proposed method N orm R N DC (S3) performs worse in only 3 problems. As for Table 4, these numbers are 8 and 2, respectively. This verifies that the proposed improvements make the method more scalable to higher number of objectives and provides a better overall performance..

Internal comparison between the variants of proposed approach (S1-S3).
Following the comparisons with the previous approaches, in this section we observe the impact of the selective evaluation and Silhouette analysis in the proposed approach. It can be observed from the Tables 2, 3 and 4 that these mechanisms progressively help to improve the consistency of the pro-posed surrogate assisted corner search based normalization. In Table 2, compared to the ND-based normalization (last row), and with adding selective evaluation, the cases of worse performance reduces from 10 (S1) to 5 (S2, S3). With increasing number of objectives, the impact of Silhouette analysis also becomes more prominent. In Tables 3 and 4, the numbers of worse cases compared to the ND-based normalization are further reduced when the Silhouette analysis is active. For example, in Table 3, compared to N orm R N D , the number of worse cases reduces from 11 (S1) to 4 (S2), to 3 (S3).
In Table 4, this number reduces from 6 (S1) to 4 (S2), to 2 (S3). Similar trend can also be observed for comparisons with the archive-based normalization method. Overall, these observations suggest the contribution of these modules on the general performance of the proposed method.

Convergence behavior for some representative problems
To supplement the few cases discussed in the above sections, we provide below a few more representative problems to show the convergence behavior. For the cases of Fig. 9a, b, N orm R A provides good initial bounds (not too far away from the true PF), and is therefore among the better strategies in terms of convergence across all considered. For Fig. 9c, d on the other hand, the case is reversed. The archive of Mean HV convergence for selected test problems using all strategies initial solutions span a much bigger range in the objective space compared to the PF, and hence the reference point is set far away, affecting the performance adversely. As for N orm R N D , it can be seen to struggle in Fig. 9a, where the initial set of ND solutions are biased and consequently could not provide a good reference point.

Concluding remarks
In this paper, we presented an approach for adjusting the normalization bounds for HV-infill based search to solve expensive MOPs. Towards this end, we incorporated a modified version of corner search on the surrogate models, followed by further selection mechanisms to reduce the computational budget directed towards corner evaluations while maintaining the associated benefits. The numerical experiments demonstrate the consistency of the proposed approach.
For the problems where the ND-based or archive-based schemes face challenges, it was observed that the proposed scheme can adapt and maintain its competitive performance.
Moreover, compared to the extreme point search approach proposed in our previous work [34], the performance of the proposed approach scales better with an increasing number of objectives. In the future, we plan to further improve the per-formance by incorporating other infill criteria and strategies to deal with problems with extreme convex and non-convex PF shapes.

Conflict of inerest
The authors declare that they have no conflict of interest.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecomm ons.org/licenses/by/4.0/.