Abstract
Singleobjective continuous optimization can be challenging, especially when dealing with multimodal problems. This work sheds light on the effects that multiobjective optimization may have in the singleobjective space. For this purpose, we examine the inner mechanisms of the recently developed sophisticated local search procedure SOMOGSA. This method solves multimodal singleobjective continuous optimization problems based on first expanding the problem with an additional objective (e.g., a sphere function) to the biobjective domain and subsequently exploiting local structures of the resulting landscapes. Our study particularly focuses on the sensitivity of this multiobjectivization approach w.r.t. (1) the parametrization of the artificial second objective, as well as (2) the position of the initial starting points in the search space. As SOMOGSA is a modular framework for encapsulating local search, we integrate Nelder–Mead local search as optimizer in the respective module and compare the performance of the resulting hybrid local search to its original singleobjective counterpart. We show that the SOMOGSA framework can significantly boost local search by multiobjectivization. Hence, combined with more sophisticated local search and metaheuristics, this may help solve highly multimodal optimization problems in the future.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
The basic idea of multiobjectivization for singleobjective (SO) optimization problems is simple (Knowles et al. 2001; Segura et al. 2016): instead of optimizing the desired objective alone, introduce a second objective and solve the resulting multiobjective (MO)—more precisely biobjective—optimization problem using MO solvers, e.g., evolutionary multiobjective algorithms (EMOAs). There are multiple reasons for doing this: First and foremost, local optima are obstacles in search space, which may lead to premature convergence of singleobjective (local) search methods. And even for explicitly designed global methods, multimodality can be a curse (Preuss 2015). A first promising report on the benefits of transforming SO problems into the MO domain was provided by Knowles et al. (2001). The authors empirically showed that multiobjectivization can reduce the amount of local optima in search space. Followup studies showed that helperobjectives can be beneficial (Jensen 2004), while others theoretically demonstrate that an additional objective can improve the search behavior of evolutionary algorithms (Neumann and Wegener 2008). In the global optimization domain, Dunlavy and O’Leary (2005) proposed a socalled homotopy optimization approach, which integrates a second (helper) objective into a homotopy function and minimizes different linear aggregations of both objectives.
A second reason for the multiobjectivization of singleobjective problems is to exploit the power of EMOAs, which are known to be capable of solving many difficult problems (Tran et al. 2013). However, the result of multiobjectivization may be ambivalent and thus have positive and negative effects on an algorithm’s performance (Handl et al. 2008; Brockhoff et al. 2007). Nevertheless, a central and beneficial property of multiobjectivization remains: if we add another objective, there is often more information available that may help in guiding the search. Some authors even report on plateau networks that may level out local optima and make it easier to avoid them (GarzaFabre et al. 2015). Using new visual representations of MO landscapes (Kerschke and Grimme 2017; Grimme et al. 2019b), we were recently able to show that MO landscapes in fact comprise structures that can be exploited for escaping local optima (Steinhoff et al. 2020). In that work, we not only described the observations in the MO landscapes, but also integrated this behavior in a local search method and demonstrated the general working principle.
Given a singleobjective problem \(f_1(x)\) with \(x\in {\mathbb {R}}^n\) that is to be optimized, we additionally introduce a sphere function \(f_2(x) = (x_1y^{*}_1)^2 + \cdots + (x_ny^{*}_n)^2\) as second objective, with \(y^{*}\in {\mathbb {R}}^n\) denoting the position of the sphere’s center and thus the only optimum of \(f_2\). Then, the resulting biobjective problem \(F(x)=(f_1(x),f_2(x))^T\) is solved by a multiobjective local search approach called multiobjective gradient sliding algorithm (MOGSA) (Grimme et al. 2019a), which essentially moves from a starting point in decision space towards the nearest local efficient set (in the MO sense, see 2.1 for terminology), explores that local efficient set, and moves on to the next (and dominating) locally efficient set of F(x). Thereby, it exploits the special structure of MO landscapes (for details, see Sect. 2). Tracing the search path from multiple starting points, we were able to show in Steinhoff et al. (2020), that we pass better local optima for \(f_1\) in that process than standard local search mechanisms (e.g., Nelder and Mead 1965) could possibly reach for many starting points. Depending on the structure of \(f_1\), those trajectories oftentimes even crossed the global optimum.
This paper extends previous work (Aspar et al. 2021), in which we presented workinprogress results for the aforementioned MO search approach based on MOGSA towards an effective SO local search mechanism. By combining multiobjectivization and MOGSA, our modularized heuristic enables the boosting of classical singleobjective local search mechanisms like Nelder–Mead (NM). Specifically, we extend our experimental setup compared to Aspar et al. (2021) and focus on the singleobjective MOGSA search heuristic (SOMOGSA) that integrates Nelder–Mead local search.
To assess the benefits of our approach compared with the capabilities of singleobjective local search and to highlight the specific properties and challenges for this approach, we evaluate the method for the 24 BBOB problems proposed by Hansen et al. (2009) in twodimensional search spaces. For those lowdimensional search spaces we are able to investigate the positive effect of the proposed hybridization visually as well as using new measures for quality gain. To complement the findings, we apply this approach to highly multimodal problem instances in five dimensions.
Our results show that the SO variant of MOGSA (SOMOGSA) significantly outperforms the considered Nelder–Mead local search. Our systematic study of starting points in search space, as well as the examination of the approach’s sensitivity to different positions of the helper objective \(f_2\), suggests that further, possibly more advanced, SO local search mechanisms can easily be incorporated into the SOMOGSA framework. Alternatively, our algorithm could be used within global search procedures to support global convergence.
The remainder of this work is structured as follows: Sect. 2 starts with providing necessary terminology and notation before we explain the idea of SOMOGSA. Section 3 describes the experimental setup, the conducted experiments and the evaluation strategy. Section 4 details the results, before Sect. 5 summarizes our findings and highlights perspectives for future work.
2 Singleobjective optimization via multiobjectivization
Within this section, we first describe the fundamental notion of multiobjective optimization problems as well as some aspects of locality to lay a foundation for further discussion. Then, we address multiobjectivization in general, and detail our respective specific approach. Afterwards, the core components of our hybrid and modularized local search algorithm SOMOGSA (and its hybridization with Nelder–Mead local search) will be outlined.
2.1 Multiobjective optimization and locality
We earlier described the general scheme of converting a singleobjective optimization problem into a multiobjective one by adding an additional objective. Here, we formally introduce some of the necessary terminology summarized and extended by Grimme et al. (2021):
Definition 1
(Multiobjective optimization problem) A multiobjective optimization problem (MOP) is commonly denoted as a vectorvalued function
with m realvalued singleobjective functions \(f_i: {\mathcal {X}} \rightarrow {\mathbb {R}}\), \(i = 1, \dots , m\), which are to be optimized simultaneously.
As we focus on continuous multiobjective optimization, the MOP’s search space \({\mathcal {X}}\) is realvalued, i.e., \({\mathcal {X}} \subseteq {\mathbb {R}}^n\). It is clear that the domination of a solution over another solution is not as simple as in SO optimization. In contrast to a totally ordered set \(({\mathbb {R}}, \le )\) in SO optimization, with its natural total order \(\le \) on \({\mathbb {R}}\), the set of solution candidates of an MOP follows the weak Pareto order \(\preceq \) on \({\mathbb {R}}^m\), which is defined as follows:
Definition 2
(Pareto order or Pareto dominance) Let \(\mathbf {a} = (a_1, \dots , a_m) \in {\mathbb {R}}^m\) and \(\mathbf {b} = (b_1, \dots , b_m) \in {\mathbb {R}}^m\). We say \(\mathbf {a}\) weakly dominates \(\mathbf {b}\) (written as \(\mathbf {a} \preceq \mathbf {b}\)), if and only if \(a_i \le b_i\) for all \(i = 1, \dots , m\). Then the Pareto order \(\prec \) on \({\mathbb {R}}^m\) is defined as follows: \(\mathbf {a}\) dominates \(\mathbf {b}\) (\(\mathbf {a} \prec \mathbf {b}\)), if and only if \(\mathbf {a} \preceq \mathbf {b}\) and \(\mathbf {a} \not = \mathbf {b}\). The weak Pareto order can also be extended to sets of points: for \(A,B\subseteq {\mathbb {R}}^m\), A weakly dominates B, if and only if \(\forall \mathbf {b} \in B\, \exists \mathbf {a} \in A\) such that \(\mathbf {a}\preceq \mathbf {b}\). The order \(\prec \) can be generalized similarly: A dominates B, if and only if \(\forall \mathbf {b} \in B\, \exists \mathbf {a} \in A\) such that \(\mathbf {a}\prec \mathbf {b}\).
Thus, the solution set of an MOP, the Pareto set, consists of optimal tradeoff solutions, i.e., nondominated solutions in decision space. The image of the Pareto set is called Pareto front:
Definition 3
(Pareto set and Pareto front) A point \(\mathbf {x} \in {\mathcal {X}}\) is denoted as globally efficient point of \({\mathcal {X}}\) (or of \(\mathbf {f}\)) if there is no point \(\tilde{\mathbf {x}} \in {\mathcal {X}}\) s.t. \(\mathbf {f}(\tilde{\mathbf {x}}) \prec \mathbf {f}(\mathbf {x})\). The set \({\mathcal {X}}_{\mathrm{E}}\) of all globally efficient points of \({\mathcal {X}}\) is called Pareto set (or globally efficient set) of \(\mathbf {f}\). The image of \({\mathcal {X}}_{\mathrm{E}}\) under \(\mathbf {f}\) is called the Pareto front of \(\mathbf {f}\) and denoted by \(\mathbf {f}({\mathcal {X}}_{\mathrm{E}})\).
Traditional multiobjective solvers specifically aim for approximating (or even exactly determining) the Pareto sets and Pareto fronts. However, as shown in Grimme et al. (2021), local structures in decision space and objective space thereby not necessarily hinder algorithm success but can rather be efficiently exploited by sophisticated approaches. Thus, with the methodological concept of multiobjectivization in mind, we need to define locality and locally efficient sets following the aforementioned work in terminology and notation (see Fig. 1 for a schematic visualization).
Definition 4
(Locally efficient point) A point \(\mathbf {x} \in {\mathcal {X}}\) is called a locally efficient point of \({\mathcal {X}}\) (or of \(\mathbf {f}\)) if there is an open set \(U \subseteq {\mathcal {X}}\) with \(\mathbf {x} \in U\) and there is no point \(\tilde{\mathbf {x}} \in U\) such that \(\mathbf {f}(\tilde{\mathbf {x}}) \prec \mathbf {f}(\mathbf {x})\). The set of all locally efficient points of \({\mathcal {X}}\) is denoted by \({\mathcal {X}}_{\mathrm{LE}}\).
Yet, as the consideration of locally efficient sets goes beyond the definition of locally efficient points, this additionally requires a notion of connectedness.
Definition 5
(Connectedness and connected component) The subset \(A \subseteq {\mathcal {X}}\) is called connected, if and only if there do not exist two open subsets \(U_1\) and \(U_2\) of \({\mathcal {X}}\) such that \(A \subseteq (U_1 \cup U_2)\), \((U_1 \cap A) \not = \emptyset \), \((U_2 \cap A) \not = \emptyset \), and \((U_1 \cap U_2 \cap A) = \emptyset \); or equivalently, there do not exist two nonempty subsets \(A_1\) and \(A_2\) of A which are open in the relative topology of A such that \((A_1 \cup A_2) = A\) and \((A_1 \cap A_2) = \emptyset \). Let B be a nonempty subset of \({\mathcal {X}}\). A subset C of B is a connected component of B, if and only if C is nonempty, connected, and there exists no strict superset of C that is connected.
Now it is possible to define the locally efficient set as a structure in search space and with its corresponding image in objective space.
Definition 6
(Locally efficient set and locally efficient front) A subset \(A \subseteq {\mathcal {X}}\) is a locally efficient set of \(\mathbf {f}\), if A is a connected component of \({\mathcal {X}}_{\mathrm{LE}}\). The image f(A) under \(\mathbf {f}\) is called a locally efficient front of \(\mathbf {f}\).
Clearly, solutions in locally efficient sets, which are not dominated by any other solution, are contained in the Pareto set. This implies \({\mathcal {X}}_E \subseteq {\mathcal {X}}_{LE}\). Following the defined distinction of globally and locally efficient points, we can also speak of a globally efficient set of solutions and a globally efficient front of solutions instead of a Pareto set or a Pareto front, respectively. Examples for both globally and locally efficient sets are shown in Fig. 1.
To even further sharpen the notion of localness, we adopt the definition of the \(\varepsilon \)neighborhood of a set in analogy to the \(\varepsilon \)ball, which surrounds a single solution (refer to Definition 4).
Definition 7
(\(\varepsilon \)neighborhood of a set) Let \(A \subseteq {\mathcal {X}}\) and \(\varepsilon >0\). The set \(A^{(\varepsilon )} := \{ x \in {\mathcal {X}}\, \vert \, \exists a \in A \text{ with } \Vert x  a \Vert < \varepsilon \}\) is the \(\varepsilon \)neighborhood of A, where \(\Vert \cdot \Vert \) is the Euclidean norm.
Then, a strict locally efficient set is completely surrounded by an \(\varepsilon \)neighborhood for some \(\varepsilon \) (see also Fig. 1, set \({\mathcal {X}}_{L,2}\)).
Definition 8
(Strict, locally efficient set) Let \(C \subseteq {\mathcal {X}}_{\text {LE}}\) be a locally efficient set. Then C is a strict, locally efficient set, if and only if \(\exists \varepsilon > 0\) such that \(C\, \preceq \, C^{(\varepsilon )}\).
For a schematic example on how a locally efficient set fails being a strict locally efficient set see Fig. 1, again. Therein, the set \({\mathcal {X}}_{L,1}\) is a halfopen segment of a straight line, whose leftmost end point is excluded (indicated by an open circle). Moreover, it is an accumulation point of \({\mathcal {X}}_E^{(\varepsilon _0)}\), i.e., for a fixed \(\varepsilon _0 >0\) there exists \(\varepsilon _1 > 0\) such that none of the points of \({\mathcal {X}}_E^{(\varepsilon _0)} \cap {\mathcal {X}}_{L,1}^{(\varepsilon _1)}\) (the blue intersecting area) is dominated by \({\mathcal {X}}_{L,1}\).
The nonstrict locally efficient sets are of special interest here. Their \(\varepsilon \)neighborhood is—as demonstrated visually—obviously superposed by the \(\varepsilon \)neighborhood of a dominating efficient set. We now consider the \(\varepsilon \)neighborhood of a locally efficient set as subset of the basin of attraction of a local set. Therefore, we define a basin of attraction as the set of points for which a multiobjective gradientbased descent (w.l.o.g. for minimization problems) strategy leads to the same locally efficient set. This multiobjective gradient descent can be considered as movement from one point in decision space to a (neighboring) dominating point in decision space following the common direction of the normalized gradients of all objectives.
The superposition which makes a locally efficient set nonstrict thus opens up the way along the effcient set into the superposing (i.e., dominating) basin of attraction. This can be algorithmically exploited and is the motivation for the method presented and investigated in this paper.
2.2 Concept of multiobjectivization
For this work, we consider the optimization of boxconstrained singleobjective continuous optimization problems as our main goal. These problems are formally given by \(\min _{x \in [l, u]} f(x)\) with \(f:{\mathbb {R}}^n\rightarrow {\mathbb {R}}\), and \(l,u \in {\mathbb {R}}^n\) being the problem’s (lower and upper) box constraints.
However, instead of optimizing the singleobjective problem directly – w.l.o.g. we denote this objective \(f_1\)—we transform the singleobjective problem into a biobjective problem \(F = (f_1, f_2)\) by introducing a second objective \(f_2\), which w.l.o.g. shall be minimized as well. This simple concept is known as multiobjectivization (Knowles et al. 2001; Jensen 2004; Neumann and Wegener 2008; Steinhoff et al. 2020) and its motivation is that the resulting biobjective problem landscape possesses structural properties which could potentially be exploited by a sophisticated multiobjective optimizer.
Among these characteristics are the previously defined locally efficient sets (see Def. 6, i.e., the multiobjective pendants of local optima). Per definition, each of these sets is a connected set of locally efficient points.
Interestingly, in multiobjective optimization, the existence of locally efficient sets can be very beneficial for local solvers. In contrast to their SO counterparts, these optima oftentimes are no traps for local solvers, but instead guide the way to more promising regions of the search space (Grimme et al. 2019b; Kerschke and Grimme 2021; Schäpermeier et al. 2022), sometimes even towards the globally efficient set.
This is rooted in superpositions of basins of attraction as described in Sect. 2.1 and exemplarily visualized in Fig. 2. This figure presents the sequential construction of a multiobjective problem from two singleobjective functions. The lefthand image shows a heatmapbased landscape plot of the highly multimodal Rastrigin problem (Hoffmeister and Bäck 1991), which we consider as \(f_1\). The middle plot shows a respective plot of the sphere function, which we consider as \(f_2\). The righthand plot shows a heatmapbased gradient field plot (Kerschke and Grimme 2017) of the multiobjective problem \(F=(f_1,f_2)^T\), where the color denotes proximity to the next locally efficient set from red (far) to blue (close). In a nutshell, the gradient heatmapbased plot of a multiobjective problem depicts by color for each point in a discretized decision space the distance (number of necessary steps via neighboring points) towards a local efficient point, if we follow the normalized multiobjective gradient direction. More detailed information on this visualization technique, related approaches, and visualization tools can be found in Schäpermeier et al. (2020, 2021). Obviously, the multiobjective landscape of the combined problems exposes many locally efficient sets (visible as greenyellow lines) surrounded by red colored basins of attraction. However, in contrast to the classical view on the Rastrigin problem with its many local traps (and local basins of attraction) the multiobjective basins superpose each other: basins of one set directly cut locally efficient sets of superposed basins. This innovative structural view allows for new ideas how to roam the search space in a directed way.
What we have observed formally means: if we are not trapped in a strict locally efficient set (see Def. 8), then dominating basins of attraction overlap dominated basins. As shown schematically in Fig. 1 and exemplarily in Fig. 2, it is then possible to move from the dominated basin towards the dominating one along the efficient set. Thereby, we definitely reach a “better” multiobjective locally efficient set—and possibly also a better area of the singleobjective search space.
By making use of the Fritz John necessary condition (John 1948), potentially local efficient points can be identified easily during multiobjective downhill movement in the search space: let \(x \in {\mathbb {R}}^n\) be a locally efficient point and all objective functions of F, i.e., \(f_1\) and \(f_2\), continuously differentiable in \({\mathbb {R}}^n\). Then, there is a weight vector \(v \in [0,1]^m\) with \(\sum _{i=1}^m v_i=1\), such that
In the biobjective case, when we are looking at a locally efficient point, the (singleobjective) gradients of \(f_1\) and \(f_2\) cancel each other out given a suitable weight vector \(v\in {\mathbb {R}}^2\). This property provides useful fundamentals for understanding and optimizing multiobjective optimization problems and (as will be shown within this work) for singleobjective problems as well. For instance, it can be used for visualizing MO landscapes (Kerschke and Grimme 2017; Schäpermeier et al. 2020), and is essential for the lately proposed multiobjective gradient sliding algorithm (MOGSA) (Grimme et al. 2019a, b). As recently shown by Steinhoff et al. (2020), multiobjectivization enables the aforementioned multiobjective algorithm to even optimize singleobjective problems. In the following subsection, we will describe this singleobjective variant of MOGSA—dubbed SOMOGSA—in more detail.
2.3 The modularized search heuristic SOMOGSA
Contrary to previous research in which global methods were applied to multiobjectivized problems—for an extensive review, we refer to Segura et al. (2013, 2016)—we will use a deterministic MO local search algorithm. It efficiently exploits properties of MO landscapes, which we previously identified using a recently developed visualization method based on gradient field heatmaps (Grimme et al. (2019b), see Fig. 2 for an example). In our setting the multiobjective problem (MOP) degenerates to a biobjective problem \(F(x)=(f_1(x), f_2(x))^T\in {\mathbb {R}}^2\). To ensure simplicity of our approach and maximal comprehensibility, we used a simple sphere function \(f_2(x) = \sum _{i = 1}^n (x_iy^{*}_i)^2\) as second objective with \(x,y^{*}\in {\mathbb {R}}^n\). It comes with two benefits: (1) the resulting biobjective problem is simple enough to avoid unwanted distractions, but still capable of guiding an MO algorithm, and (2) it allows for direct, analytical determination of the corresponding gradient (which is utilized by SOMOGSA). As such, this choice seems to be costeffective and logical for this study. Note that in principle a different and more complex choice of \(f_2\) is possible. A clever choice of \(f_2\) could help in creating specific MO descent structures to guide SOMOGSA directly towards a desired path. It is still unclear, however, what effort has to be put into finding such a structure of \(f_2\) and how the interaction of \(f_1\) and \(f_2\) can be predicted for generating a beneficial MO structure, specifically if \(f_1\) is a gray or even a black box problem. In this study we rely on Occam’s principle of choosing the simplest model for evaluation.
The general idea of our proposed hybrid and modularized search heuristic SOMOGSA is given schematically in Fig. 3. While SOMOGSA enables the exploitation of the multiobjective landscape structures, it can encapsulate an arbitrary local search for refinement of \(f_1\) results. In Sect. 3, we will exemplarily show this for the Nelder–Mead local search algorithm. The SOMOGSA framework is described in more detail in Algorithm 1. Apart from adding a sphere as second objective (line 1), SOMOGSA essentially (repeatedly) performs the following steps:

1.
Perform an MO gradient descent (i.e., ‘slide down’) into the vicinity of the attracting locally efficient set (lines 5 and 6).

2.
Once an (almost) locally efficient point has been reached, SOMOGSA will perform a singleobjective local search towards the corresponding (singleobjective) local optimum of \(f_1\) (line 7). Here, the user can choose his/her favorite local search strategy.

3.
As the local optimum defines one end of the locally efficient set, SOMOGSA now can traverse that set towards the neighboring (better) attraction basin (lines 1012) by performing a singleobjective gradient descent in the direction of the second objective (\(f_2\)).
Note that the criterion for stopping multiobjective downhill movement is based on the FritzJohn condition, which is only a necessary criterion and may lead to premature termination of step 1. This case is not explicitly handled by our approach. Instead, we heuristically assume that the following singleobjective local search w.r.t. \(f_1\) will lead the search near a locally efficient solution again, where step 3 will continue as planned.
The above mentioned steps are repeated until either (a) SOMOGSA has reached the sphere’s optimum (line 3), or (b) the objective value of \(f_1\) got worse between two subsequently found local optima of \(f_1\) (line 8). In the latter case, we interrupt the optimization, as SOMOGSA wouldn’t improve any longer w.r.t. \(f_1\) but instead waste the remaining budget to eventually reach the optimum of \(f_2\). Note that the gradient of the original objective function \(f_1\) usually needs to be computed numerically, e.g., by twosided finite differences, while the gradient of the helper objective \(f_2\) may be known analytically.
As described above, SOMOGSA basically provides a modularized framework, which allows to make use of the strengths of MOGSA. However, the performance of SOMOGSA obviously depends on its configuration (e.g., which problem is used as objective \(f_2\), where is the optimum of \(f_2\) located, which local search strategy is used, which starting point is chosen etc.). Therefore, in the following, we will investigate the sensitivity of our method w.r.t. different parameters and module choices. In addition, we experimentally demonstrate that SOMOGSA can significantly boost the potential of simple singleobjective local search algorithms.
3 Experiments
3.1 Setup
In this work, we expand our test environment used in Aspar et al. (2021) and consider the 24 noiseless singleobjective BBOB problems from the COCO platform (Hansen et al. 2009) in the twodimensional decision space. For SOMOGSA, these functions are referred to as objective \(f_1\) in the multiobjectivized problem. As second objective, we consider \(f_2(x) = \sum _{i = 1}^2 (x_iy^{*}_i)^2\) with \(x,y^{*}\in {\mathbb {R}}^2\), as explained above. Besides investigating the principal benefits of applying SOMOGSA together with a classical SO local search, we specifically focus on the sensitivity of the approach w.r.t the starting position of the search, as well as the location of the optimum of \(f_2\). We thus discretize the decision space using a regular grid X for generating \(N = d \times d\) starting points. We choose \(d = 50\), as (empirically) larger values of d do not provide additional insights and lower values do not reveal all desired structures. The methods to be assessed are executed for all points in the grid centers of X. Ten positions \(y^{*}\) of the \(f_2\) optimum were generated^{Footnote 1} by using Latin Hypercube Sampling (McKay et al. 1979) and subsequently mapping those points to their nearest neighbors in X^{Footnote 2}. The (default) parameters mentioned in Algorithm 1 were set to \(t_\angle = 170\) as angle for denoting sufficient proximity to a potentially local efficient set, \(\sigma _{MO} = 0.05\) as step size for multiobjective descent, and \(\sigma _{SO} = 0.1\) as step size for fast traversal by following the approximated gradient of \(f_2\) for reaching the next basin of attraction. In Aspar et al. (2021) Gradient Search (GS) and Nelder–Mead (NM) (Nelder and Mead 1965) were used as local search mechanisms inside the SOMOGSA framework, resulting in two variants of SOMOGSA: SOMOGSA+GS and SOMOGSA+NM. As a baseline, we also evaluated GS and NM as standalone methods on each considered SO problem. In order to prevent possible infinite cycling of local search (specifically NM can run into cyclic behavior), the number of local search steps was restricted to a maximum of 400. Since the method SOMOGSA+NM consistently outperformed SOMOGSA+GS, in this work, we solely focus on SOMOGSA+NM and Nelder–Mead as the corresponding standalone method. Furthermore, our previous research revealed that both algorithms barely require more than 2000 function evaluations for \(n=2\) until one of the internal termination criteria has been reached. Therefore, we set the optimizer budget to \(n \times 1000 = 2000\) function evaluations. Thereby, an optimizer run is considered successful, if it comes sufficiently close to the optimum, i.e., if it stagnates in an \(\varepsilon \)neighborhood of the optimum of \(f_1\) (using a precision of \(p = \varepsilon = 0.01\)). This is a common experimental approach within the BBOB benchmark setting (Hansen et al. 2009). Moreover, we report the amount of function evaluations the optimizers actually need. For Nelder–Mead, we assume an average number of 1.5 function evaluations per iteration due to varying simplex generation procedures depending on the local landscape structure^{Footnote 3}.
In order to ensure to stay within the boundaries of our boxconstrained problems, we integrate the following approach: As soon as Nelder–Mead spans the next simplex beyond any boundary, this simplex is clipped at the crossed boundary, respectively, and the best reached solution by the time is selected. This kind of constraint handling is advantageous, but may lead to early stoppings in some cases, especially for the standalone method of Nelder–Mead. For starting points near a boundary, the algorithm often passes the closest boundary rapidly and thus stops.
As a proofofconcept study, we investigate how our findings generalize to higher decision space dimensions and expand our experiments to the two highly multimodal BBOB functions 21 and 22 (called Gallagher’s 101 and 21 Peak functions) in the 5dimensional decision space. Likewise, these problems are evaluated for ten different sphere positions and 2500 starting points. Both the center of the sphere function and the starting points are generated via the Latin Hypercube Sampling method (McKay et al. 1979). The precision value remains at \(p = 0.01\). We have assigned the optimizers a fixed budget of \(n \times 1000\) function evaluations as well (with \(n = 5\)).
3.2 Evaluation concept
We investigate the sensitivity of SOMOGSA regarding (a) the parametrization of the second objective, (b) the position of the starting point, and (c) the local search strategy. We define specific metrics for capturing the performances of each variant in all considered settings (see Fig. 4).
As first metric, we define the gain (a value we want to maximize) as
where \(x_s\) is a starting point, \(x_b\) the best local search result and \(x^{*}\) the known global optimum w.r.t. \(f_1\). This measure quantifies the achieved performance gain by the applied local search. Therefore, we compute the ratio of the gap between \(f(x_s)\) and \(f(x^{*})\), which is closed by applying the considered search heuristic. As a (global) performance measure, we also compare the gap (a value we want to minimize) that cannot be closed by the local search, relating to all starting points. This is quantified by
This measure expresses the gap between \(f(x_b)\) and \(f(x^{*})\) w.r.t the global maximum gap of all starting points function values from X to the optimum value. For a schematic representation of both measures refer to Fig. 4.
Both measures are evaluated for all combinations of N starting points and considered benchmark problem. The overall aggregated performance of each method can be statistically analyzed and visualized in the decision space, see, e.g., Fig. 5.
Furthermore, we assess the algorithm’s capability to reach the optimal solution (of \(f_1\)) from any of the starting points of the (discretized) search space X by determining the relative frequency of success (to be maximized by the algorithm), denoted as (success) ratio
where \(H_N(LS)\) is the absolute frequency of the N algorithm runs that successfully converged to the optimum of \(f_1\) w.r.t. the precision \(p=0.01\).
4 Results
In the following, the results of our experiments are analyzed on the basis of the aforementioned performance indicators. Our investigations have shown that the results of the \(G_{LS}\) measure qualitatively confirm those of \(g_{LS}\). Thus, in this work, we focus on presenting only results for \(g_{LS}\) and the (success) ratio \(r_N\).
Figure 5 shows heatmaps of the \(g_{LS}\) measures for the considered search space of the Gallagher’s Gaussian 101me Peaks function (BBOB function 21), Gallagher’s 21hi Peak function (BBOB function 22), and the Lunacek biRastrigin function (BBOB function 24). The idea behind this visualization is to assign the \(g_{LS}(x_s)\) value of the SOMOGSA variant or of the standalone LS run, respectively, to each starting point \(x_s\) (i.e., pixel of the corresponding heatmap) of the twodimensional search space. This technique leads to a heatmap, in which the blue color denotes zero or little gain while the red color denotes maximum gain. Note that the top left heatmap for each function merely shows the singleobjective problem landscape itself, while the second left heatmap depicts the \(g_{LS}\) values for any starting point when applying the standalone version of NM. The remaining heatmaps show the results for each of the considered ten positions of the optimum of \(f_2\).
Considering the heatmaps, which illustrate the BBOB functions F21 and F22 (i.e., the first four rows), we observe that the heatmaps of the original (NM) local search show that local optima are often traps for these approaches. In our hybridized method SOMOGSA+NM, this turns out differently. Trapping areas (denoted as blue or greenyellowish structures for NM) vanish from the heatmaps and become red. Such changes indicate that for those starting points, the SOMOGSA framework realizes a significant gain and is clearly superior. As we observe the entire decision space, here, we can state that multiobjectivization and specifically SOMOGSA has the effect of removing (or opening up) traps for the singleobjective optimizer. Although we observe different structures for varying locations of \(f_2\), the general behaviour coincides for these multiobjectivized problems.
The heatmaps of the Lunacek biRastrigin function (BBOB function 24), see Fig. 5, reveal a slightly different situation. Differences in the performance w.r.t. the sphere position (i.e., the position of the additional objective) become apparent. Placing the global optimum in one of the locations 1, 2, 3, 4, 8, or 10 helps the optimizer to avoid local points or even areas as traps. In contrast, placing the sphere optimum in location 5, 7, or 9 results in an opposite effect. Despite a shifting of the values, we can assume a related performance to the original method for position 6. Thus, a correlation between the sphere optimum and the results is directly visible for this specific problem. This leads to the assumption that the choice of the position of the spherical optimum is irrelevant for some function classes. Still, for other problems, one should choose the position of the spherical optimum appropriately.
This is also underlined by Fig. 6, which displays the \(g_{LS}\) values as boxplots. Therein, we compare the original NM local search (indicated by red boxes) to the hybridized SOMOGSA+NM method (indicated by the blue boxes) for each of the 24 BBOB functions. The results are splitted according to the five function groups provided by Hansen et al. (2009). Examining the individual groups in more detail, functions 3 and 4 stand out within the group of separable functions (F1–F5, see Fig. 6a). An adequately chosen sphere optimum can significantly boost the performance concerning these problems. Moreover, function 7 stands out (see Fig. 6b). Contrary to all other problems, here, even our hybridized method is hindered in making progress, attributed to the specific problem structure. Given numerous plateaus, this function consists of many areas in which the gradient is zero. Since SOMOGSA+NM is a gradientbased method, premature convergence occurs. However, an improvement can be seen for the group of functions with high conditioning and unimodal shape (F10–F14, see Fig. 6a). It appears that these functions have structures that are not significantly affected by the position of the sphere center. In addition, according to the sphere optimum, the performance fluctuates for the multimodal functions with adequate global structure (F15–F19, see Fig. 6b) as well and is particularly visible for problems 16, 17, and 18. Again, we see an improvement in the performance by adopting multiobjectivization for the BBOB functions 21 and 22, and the varying values for the BBOB function 24, see Fig. 6. In order to further consolidate our observations, we also tested this utilizing a (pairwise) paired Wilcoxon ranksum test between the standalone NM and each biobjective problem variant to which we apply SOMOGSA+NM (significance level \(\alpha = 5\%\)). We indicated the significantly betterperforming variant by a blue *. In general, the standalone method does not outperform our hybridized approach for any biobjective problem regarding the ten global sphere optima.
Figure 7 also confirms the aforementioned results. Note that this figure displays the results of our 2dimensional test environment and the 5dimensional study (third row second column). The red boxplots represent the ratio values achieved by the hybridized SOMOGSA+NM method, resulting in 10 ratio values per function (one for each sphere position). The blue line indicates the success ratio of the standalone NM method applied to the singleobjective problem \(f_1\). For nearly all considered positions of \(f_2\), the multiobjectivization approach reaches the global optimum more often than the pure local search. Furthermore, considering the results in the 5dimensional decision space, it becomes apparent that there is potential for a benefit due to the multiobjectivization as well. For this dimension, SOMOGSA+NM consistently was more successful than NM for both considered functions (i.e., F21 and F22).
In addition to the allocation of a fixed budget, we recorded the number of function evaluations the algorithms actually used, respectively, and displayed them in Fig. 8 using boxplots. Clearly, SOMOGSA+NM as a more sophisticated hybrid local search technique requires a higher number of function evaluations than its original counterpart Nelder–Mead. However, this is also rooted in the exploratory steps of SOMOGSA along multiobjective efficient sets for escaping basins, which would have been traps for standard local search. As such, SOMOGSA proceeds further and gains better results than standard local search. Additionally, keeping the large gain in performance in mind, a sophisticated hybridization, which uses a global optimizer combined with a landscapebased determination of the most appropriate combination of sphere optimum and starting position, constitutes a very promising line of future research.
5 Conclusion
This paper shows the usefulness of transforming multimodal singleobjective problems into multiobjective counterparts by artificially adding a second objective. Specifically, we investigate the performance of our algorithm SOMOGSA+NM, which hybridizes the sophisticated multiobjective MOGSA solver with Nelder–Mead local search, on the noiseless BBOB benchmark set.
The singleobjective BBOB functions are complemented by a sphere function with varying center position. Specifically developed performance indicators reveal that SOMOGSA+NM consistently outperforms the standalone version of Nelder–Mead in the twodimensional decision space. However, it also requires a considerably larger number of function evaluations. Sensitivity regarding the sphere position is quite low for unimodal functions, but increases for multimodal functions. Therefore, an individual yet automated configuration of this parameter based on landscape characteristics—i.e., numerical features derived by means of Exploratory Landscape Analysis (see, e.g., Kerschke and Trautmann 2019)—of the original singleobjective problem is a very promising perspective to reach optimal SOMOGSA+NM performance.
Of course, SOMOGSA itself is still a local search mechanism also after hybridization with Nelder–Mead (or other local search approaches). The proximity to the optimum of individual runs depends on the starting position within the search space. Next steps will thus include the integration of SOMOGSA+NM into a metaheuristic which will be able to determine suitable starting points in a sophisticated manner including systematic restarts. Similar to the aforementioned automated determination of the sphere’s optimum, Exploratory Landscape Analysis techniques can be highly supportive for the identification of promising starting positions. Moreover, based on first very promising studies, we will extend our analysis to higherdimensional search spaces in order to construct highperforming hybrid metaheuristics in the multiobjectivized problem space.
Notes
LHS implementation of the pyDOE package: https://pythonhosted.org/pyDOE/.
The ten considered sphere positions are: \((3.5, 1.5)\), \((1.5, 0.5)\), \((0.5,2.5)\), \((2.5, 2.5)\), \((4.5,0.5)\), \((2.5, 3.5)\), (1.5, 3.5), \((4.5, 4.5)\), \((3.5, 4.5)\), (0.5, 1.5)
The NM Python procedure only reports number of iterations used in total.
References
Aspar P, Kerschke P, Steinhoff V, et al (2021) Multi\(^3\): optimizing multimodal singleobjective continuous problems in the multiobjective space by means of multiobjectivization. In: Proceedings of the 11th international conference on evolutionary multicriterion optimization (EMO). Springer, pp 311–322. https://doi.org/10.1007/9783030720629_25
Brockhoff D, Friedrich T, Hebbinghaus N, et al (2007) Do additional objectives make a problem harder? In: Proceedings of the 9th annual conference on genetic and evolutionary computation (GECCO), pp 765–772
Dunlavy DM, O’Leary DP (2005) Homotopy optimization methods for global optimization. Technical report, Sandia National Laboratories (SNL), Albuquerque, NM, and Livermore, CA
GarzaFabre M, ToscanoPulido G, RodriguezTello E (2015) Multiobjectivization, fitness landscape transformation and search performance: a case of study on the HP model for protein structure prediction. Eur J Oper Res (EJOR) 243(2):405–422
Grimme C, Kerschke P, Emmerich MTM, et al (2019a) Sliding to the global optimum: how to benefit from nonglobal optima in multimodal multiobjective optimization. In: AIP conference proceedings. AIP Publishing, pp 020,052–1–020,052–4
Grimme C, Kerschke P, Trautmann H (2019b) Multimodality in multiobjective optimization—more boon than bane? In: Proceedings of the 10th international conference on evolutionary multicriterion optimization (EMO). Springer, pp 126–138
Grimme C, Kerschke P, Aspar P et al (2021) Peeking beyond peaks: challenges and research potentials of continuous multimodal multiobjective optimization. Comput Oper Res 136(105):489. https://doi.org/10.1016/j.cor.2021.105489
Handl J, Lovell SC, Knowles J (2008) Multiobjectivization by decomposition of scalar cost functions. In: Proceedings of the 10th international conference on parallel problem solving from nature (PPSN X). Springer, pp 31–40
Hansen N, Finck S, Ros R, et al (2009) Realparameter blackbox optimization benchmarking 2009: noiseless functions definitions. Research Report RR6829, INRIA. https://hal.inria.fr/inria00362633
Hoffmeister F, Bäck T (1991) Genetic algorithms and evolution strategies: similarities and differences. In: Parallel problem solving from nature (PPSN). Springer, pp 455–469
Jensen MT (2004) Helperobjectives: using multiobjective evolutionary algorithms for singleobjective optimisation. J Math Model Algorithms 3(4):323–347
John F (1948) Extremum problems with inequalities as subsidiary conditions, studies and essays presented to R. Courant on his 60th Birthday, January 8, 1948
Kerschke P, Grimme C (2017) An expedition to multimodal multiobjective optimization landscapes. In: Proceedings of the 9th international conference on evolutionary multicriterion optimization (EMO). Springer, pp 329–343
Kerschke P, Grimme C (2021) Lifting the multimodalityfog in continuous multiobjective optimization. In: Metaheuristics for finding multiple solutions. Springer, pp 89–111
Kerschke P, Trautmann H (2019) Comprehensive featurebased landscape analysis of continuous and constrained optimization problems using the Rpackage flacco. In: Applications in statistical computing. Springer, pp 93–123. https://doi.org/10.1007/9783030251475_7
Knowles JD, Watson RA, Corne DW (2001) Reducing local optima in singleobjective problems by multiobjectivization. In: Proceedings of the international conference on evolutionary multicriterion optimization (EMO). Springer, pp 269–283
McKay MD, Beckman RJ, Conover WJ (1979) A comparison of three methods for selecting values of input variables in the analysis of output from a computer code. Technometrics 21(2):239–245
Nelder JA, Mead R (1965) A simplex method for function minimization. Comput J 7(4):308–313
Neumann F, Wegener I (2008) Can singleobjective optimization profit from multiobjective optimization? In: Multiobjective problem solving from nature. Springer, pp 115–130
Preuss M (2015) Multimodal optimization by means of evolutionary algorithms. Springer, Berlin
Schäpermeier L, Grimme C, Kerschke P (2020) One PLOT to show them all: visualization of efficient sets in multiobjective landscapes. In: International conference on parallel problem solving from nature (PPSN). Springer, pp 154–167
Schäpermeier L, Grimme C, Kerschke P (2021) To boldly show what no one has seen before: a dashboard for visualizing multiobjective landscapes. In: Proceedings of the 11th international conference on evolutionary multicriterion optimization (EMO). Springer, pp 632–644. https://doi.org/10.1007/9783030720629_50
Schäpermeier L, Grimme C, Kerschke P (2022) MOLE: digging tunnels through multimodal multiobjective landscapes. In: Proceedings of the 24th annual conference on genetic and evolutionary computation (GECCO). ACM
Segura C, Coello Coello CA, Miranda G et al (2013) Using multiobjective evolutionary algorithms for singleobjective optimization. 4OR 11(3):201–228
Segura C, Coello Coello CA, Miranda G et al (2016) Using multiobjective evolutionary algorithms for singleobjective constrained and unconstrained optimization. Ann Oper Res 240(1):217–250
Steinhoff V, Kerschke P, Aspar P, et al (2020) Multiobjectivization of local search: singleobjective optimization benefits from multiobjective gradient descent. In: Proceedings of the IEEE symposium series on computational intelligence (SSCI), pp 2445–2452. https://doi.org/10.1109/SSCI47803.2020.9308259
Tran TD, Brockhoff D, Derbel B (2013) Multiobjectivization with NSGAII on the noiseless BBOB testbed. In: Proceedings of the 15th annual conference on genetic and evolutionary computation (GECCO) companion. ACM, pp 1217–1224
Acknowledgements
The authors acknowledge support by the European Research Center for Information Systems (ERCIS).
Funding
Open Access funding enabled and organized by Projekt DEAL.
Author information
Authors and Affiliations
Corresponding authors
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Aspar, P., Steinhoff, V., Schäpermeier, L. et al. The objective that freed me: a multiobjective local search approach for continuous singleobjective optimization. Nat Comput 22, 271–285 (2023). https://doi.org/10.1007/s1104702209919w
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s1104702209919w