1 Introduction

Black-box optimization (BBO) is an approach for optimizing an objective function without any information regarding the analytic form and gradient of the objective. This approach has been applied to important real-world problems, including automatic tuning of machine learning models [2, 18] and evacuation route planning [21]. In such real-world problems, it is generally important to obtain a promising solution with a limited evaluation budget because their objective functions usually involve computationally expensive operations, such as training of deep neural networks and crowd evacuation simulations.

The Nelder–Mead (NM) method [17] is a local search heuristic that uses a simplex, and it has been used to solve BBO problems for more than half a century. This method converges quickly with a relatively small number of function evaluations; thus, it can achieve preferable results for computationally expensive problems, such as the automatic tuning of machine learning models [18]. However, it is empirically known that the search performance of the NM method strongly depends on initialization, which concerns the generation of the initial simplex. Therefore, to achieve good optimization results, it is crucial to provide proper initialization. However, there has been only a very limited discussion on the proper initialization of the NM method so far.

Recently, Wessing investigated how initialization affects the search performance of the NM method, using the Sphere function [26]. After arguing for the necessity of determining a proper initial simplex, Wessing proposed generating an initial simplex that is as large as the normalized search space. However, it is currently unclear whether the obtained results can be generalized to a variety of computationally expensive problems because they are solely based on an analysis using the simple Sphere benchmark function in the large evaluation budget case. Therefore, further empirical assessments are required for practitioners to identify the best practice for the initialization of the NM method to solve computationally expensive problems.

Additionally, practitioners need to consider handling infeasible solutions to tackle constrained problems, even considering the simplest box-constrained case, because the NM method was originally designed for unconstrained optimization. The handling methods used to address constraints significantly affect the search performance in BBO [25]. Therefore, the effects of these methods on the proper initialization of the NM method should be investigated.

Motivated by the above discussion, in this study, we massively investigate how initialization affects the NM method on a proven benchmark suite, namely BBO benchmarking (BBOB) [8]. The main contributions and important findings of this study are summarized as follows:

  • We empirically find that the search performance of the NM method highly depends not only on the size of the initial simplex but also on its shape.

  • We also present a practical initialization heuristic to maximize the performance of the NM method for the limited-evaluation-budget case based on the experimental results. This involves normalizing the search space to unit hypercube and generating a regular-shaped simplex that is as large size as possible regardless of the constraint handling method that is employed.

2 Background

In this section, we describe the NM method, its initialization methods, and the methods for handling box constraints.

2.1 Nelder–Mead method

figure a

The algorithm of the NM method for minimization is shown in Algorithm 1. To optimize an n-dimensional objective function, the NM method requires a simplex composed of affinely independent \(n+1\) vertices in an n-dimensional search space. For example, a two-dimensional simplex is a triangle, and a three-dimensional simplex is a tetrahedron. The NM method iteratively performs the five major operations of transforming the simplex—Reflection, Expansion, Outside Contraction, Inside Contraction, and Shrinkage-based on the objective function values of the solutions corresponding to each vertex of the simplex. Figure 1 shows these five operations in two-dimensional space. In this figure, \({\varvec{y}}^0\), \({\varvec{y}}^1\), and \({\varvec{y}}^2\) are the vertices of the simplex before the operation such that \(f({\varvec{y}}^0)< f({\varvec{y}}^1) < f({\varvec{y}}^2)\). For the coefficients of the NM method in Algorithm 1, we use the following standard settings [3]:

$$\begin{aligned} \delta ^\mathrm {r}=1, \; \delta ^\mathrm {e}=2, \; \delta ^\mathrm {oc}=\frac{1}{2}, \; \delta ^\mathrm {ic}=-\frac{1}{2}, \; \gamma =\frac{1}{2}. \end{aligned}$$
(1)
Fig. 1
figure 1

Simplex transformations by the NM method: Reflection (\({\varvec{y}}^\mathrm {r}\)), Expansion (\({\varvec{y}}^\mathrm {e}\)), Outside Contraction (\({\varvec{y}}^\mathrm {oc}\)), Inside Contraction (\({\varvec{y}}^\mathrm {ic}\)), and Shrinkage (\({\varvec{y}}^\mathrm {s1}\) and \({\varvec{y}}^\mathrm {s2}\))

2.2 Initialization methods for the Nelder–Mead method

Fig. 2
figure 2

Examples of two-dimensional simplices generated by the Pfeffer, Nash, Han, Varadhan, and Std basis methods

A previous study [26] investigated the five initialization methods: Pfeffer [7], Nash [16], Han [9], Varadhan [22, 23], and Std basis [24]. Examples of the simplices generated using these methods are shown in Fig. 2. The shapes of the generated simplices can be classified into two types, “regular” and “standard,” with a few exceptions. The former is a simplex that all of its side lengths are the same, and the latter is a simplex that its vertices correspond to the standard basis vectors. In Fig. 2, we observe that the Han and Varadhan methods generate regular simplices, whereas the Nash and Std basis methods generate standard simplices. For the Pfeffer method, the diagonally placed simplices are standard, but the remaining ones are sharper.

2.3 Handling methods for box constraints

Fig. 3
figure 3

Objective landscapes of the Attractive Sector function [8] (\(D = [-5, 5]^2\)) with each handling method. The gray area indicates \(+\infty \)

In BBO, one of the most frequently appearing constraints is the box constraint, wherein a variable has specific lower and upper bounds. Several methods are available to handle box constraints, including the Extreme Barrier [1], Projection [10, 12,13,14], Reflection [25], and Wrapping [20] methods (see Fig. 3). All of these approaches transform constrained problems into unconstrained problems, to which the NM method can be applied.

Assume \(f : {\mathbb {R}}^n \rightarrow {\mathbb {R}}\) and \(D = [l_0, u_0] \times [l_1,u_1] \times \cdots \times [l_{n-1}, u_{n-1}] \subset {\mathbb {R}}^n\). We consider the minimization problem \(\min _{{\varvec{y}} \in {\mathbb {R}}^n} f({\varvec{y}})\) subject to \({\varvec{y}} \in D\). The Extreme Barrier approach defines a penalty function \(f_\mathrm {E}: {\mathbb {R}}^n \rightarrow {\mathbb {R}} \cup \{+\infty \}\) that assigns \(+\infty \),Footnote 1 which is the penalty value, to the objective function value corresponding to an infeasible solution as follows:

$$\begin{aligned}&\text {(Extreme Barrier)}\nonumber \\&f_\mathrm {E}({\varvec{y}}) = {\left\{ \begin{array}{ll} f({\varvec{y}}) &\quad {\varvec{y}} \in D \\ +\infty &\quad {\varvec{y}} \not \in D. \\ \end{array}\right. }&&\end{aligned}$$
(2)

Subsequently, we minimize the penalty function \(f_\mathrm {E}\) instead of the original objective function f to solve the target problem. The Projection, Reflection, and Wrapping approaches define repair functions \(f_\mathrm {P}: {\mathbb {R}}^n \rightarrow {\mathbb {R}}\), \(f_\mathrm {R}: {\mathbb {R}}^n \rightarrow {\mathbb {R}}\), and \(f_\mathrm {W}: {\mathbb {R}}^n \rightarrow {\mathbb {R}}\), respectively, which assign the objective function value corresponding to a specific feasible solution to that of an infeasible solution by applying a simple mapping rule:

$$\begin{aligned}&\text {(Projection)} \nonumber \\&f_\mathrm {P}({\varvec{y}}) = f([T_{\mathrm {P}_0}(y_0), \dots , T_{\mathrm {P}_{n-1}}(y_{n-1})]),&&\end{aligned}$$
(3)
$$\begin{aligned}&T_{\mathrm {P}_i}(y) = {\left\{ \begin{array}{ll} y &\quad l_i \le y \le u_i \\ u_i &\quad y > u_i \\ l_i &\quad y < l_i. \end{array}\right. } \end{aligned}$$
(4)
$$\begin{aligned}&\text {(Reflection)} \nonumber \\&f_\mathrm {R}({\varvec{y}}) = f([T_{\mathrm {R}_0}(y_0), \dots , T_{\mathrm {R}_{n-1}}(y_{n-1})]),&&\end{aligned}$$
(5)
$$\begin{aligned}&T_{\mathrm {R}_i}(y) = {\left\{ \begin{array}{ll} y &\quad l_i \le y \le u_i \\ T_{\mathrm {R}_i}(u_i + (u_i - y)) &\quad y > u_i \\ T_{\mathrm {R}_i}(l_i + (l_i - y)) &\quad y < l_i. \end{array}\right. } \end{aligned}$$
(6)
$$\begin{aligned}&\text {(Wrapping)} \nonumber \\&f_\mathrm {W}({\varvec{y}}) = f([T_{\mathrm {W}_0}(y_0), \dots , T_{\mathrm {W}_{n-1}}(y_{n-1})]),&&\end{aligned}$$
(7)
$$\begin{aligned}&T_{\mathrm {W}_i}(y) = {\left\{ \begin{array}{ll} y &\quad l_i \le y \le u_i \\ T_{\mathrm {W}_i}(y - (u_i - l_i)) &{} y > u_i \\ T_{\mathrm {W}_i}(y + (u_i - l_i)) &\quad y < l_i. \end{array}\right. } \end{aligned}$$
(8)

In these equations, \(T_{\mathrm {P}_i}: {\mathbb {R}} \rightarrow [l_i, u_i]\), \(T_{\mathrm {R}_i}: {\mathbb {R}} \rightarrow [l_i, u_i]\), and \(T_{\mathrm {W}_i}: {\mathbb {R}} \rightarrow [l_i, u_i]\) \((i = 0, \dots , n - 1)\) are the auxiliary mapping functions. Similar to the Extreme Barrier approach, we minimize \(f_\mathrm {P}\), \(f_\mathrm {R}\), and \(f_\mathrm {W}\) instead of f to solve the target problem using these approaches.

3 Investigating the effect of initialization

In this section, we empirically investigate the effect of the initial simplex of the NM method using comprehensive experiments. We focus on the effects of the initial simplex size and shape and the method employed to handle box constraints on the search performance for the limited-evaluation-budget case. Our research questions are described as follows:

Q.1:

Is it better to generate a larger initial simplex as Wessing [26] previously reported?

Q.2:

Which initial simplex shape is better, regular, or standard?

Q.3:

Is the proper initial simplex dependent on the constraint handling method that is employed?

Regarding the effect of the simplex size, it can be quantitatively evaluated using the volume metric [1, 3]. Therefore, we evaluate initial simplices with different simplex volumes. The volume of the n-dimensional simplex \({\varvec{Y}}=\{{\varvec{y}}^0, {\varvec{y}}^1, \dots , {\varvec{y}}^n\}\) is defined as:

$$\begin{aligned} \mathrm {vol}({\varvec{Y}})=\frac{|\det ({\varvec{L}})|}{n!} \end{aligned}$$
(9)

where \({\varvec{L}}\) denotes a matrix such that:

$$\begin{aligned} {\varvec{L}}&= [({\varvec{y}}^1-{\varvec{y}}^0), ({\varvec{y}}^2-{\varvec{y}}^0), \dots ,({\varvec{y}}^n-{\varvec{y}}^0)]. \end{aligned}$$
(10)

Regarding the effect of the simplex shape, we evaluate regular and standard simplices because these shapes are widely adopted by existing initialization methods (cf. Sect. 2.2). Finally, for the effect of the constraint handling methods, we evaluate the Extreme Barrier, Projection, Reflection, and Wrapping approaches (cf. Sect. 2.3).

3.1 Experimental setup

figure b
figure c

In our experiments, the search space was always assumed to be normalized to the n-dimensional unit hypercube \([0, 1]^n\) in advance according to Wessing [26]. To generate regular and standard simplices, we prepared Algorithm 2 [19] and Algorithm 3, respectively. Algorithm 2 requires the dimension n, the L2 norm \(\gamma \) to determine the generated simplex size, and the centroid \({\varvec{p}}\) of the simplex as the input parameters. The larger the \(\gamma \), the larger the volume of the generated simplex. Conversely, Algorithm 3 requires the dimension n, the criterion simplex volume v, which allows us to generate a standard simplex with the same volume as a regular simplex to compare, and the centroid \({\varvec{p}}\) of the simplex as input parameters. Figure 4 shows examples of two-dimensional simplices, in which their centroids are \({\varvec{0.5}}\).

Fig. 4
figure 4

Examples of two-dimensional simplices. 0.2 and 0.45 indicate the input \(\gamma \) values of the regular simplices. The red points indicate the centroids. The volume of (c) is the same as (a) and the volume of (d) is the same as (b)

We evaluated the search performance of the NM method initialized with a variety of simplices on 24 benchmark functions described in detail later by employing each of the four constraint handling methods introduced in Sect. 2.3. By using the simplex generation algorithms, we generated 200 types of initial simplices with centroids randomly placed at \([0.1, 0.9]^n\) for each benchmark function instance. First, 100 types of regular simplices were generated by using Algorithm 2 with the L2 norm \(\gamma = 0.45 \times 0.01, 0.45 \times 0.02, \dots , 0.45 \times 1.00\). The remaining 100 types of standard simplices were generated by using Algorithm 3 with the criterion volumes that are the same as the volumes of the 100 types of regular simplices.

Table 1 List of BBOB functions [8]

As a benchmark suite, we employed BBOB [8], which is one of the most popular benchmarks for evaluating the performance of BBO algorithms. BBOB contains 24 artificial functions with a box-constrained search space \([-5 ,5]^n\). These functions are classified into five groups based on their landscape features: 1. separable (#1–5), 2. low or moderate-conditioning (#6–9), 3. high-conditioning and unimodal (#10–14), 4. multimodal with an adequate global structure (#15–19), and 5. multimodal with a weak global structure (#20–24), as shown in Table 1. All benchmark functions of BBOB are parameterized, that is, different instances of the same function are available (e.g., low/high-dimensional, translated, and shifted versions) [8]. We prepared the three kinds of dimensions for each benchmark function: \(n = 5, 10, 15\). The evaluation budget was set to 400 (including initialization) because, in this study, we assumed that the problems were computationally expensive. Note that, with the Extreme Barrier approach, evaluations of out-of-search-space solutions were not counted because the corresponding actual objective function evaluations were not needed (i.e., their computational costs were negligible) [4]. We evaluated each setting on 100 translated-and-shifted versions of each benchmark function to obtain the average performance of the setting and the corresponding \(95\%\)-confidence interval. In summary, we collected 24 (benchmark functions) \(\times \) 3 (dimensions) \(\times \) 100 (translated-and-shifted versions) \(\times \) 200 (100 regular \(+\) 100 standard initial simplices) \(\times \) 4 (constraint handling methods) \(= 5,760,000\) optimization results.

3.2 Results and discussion

Fig. 5
figure 5

Results: mean achieved objective value versus initial simplex volume for \(n = 15\). The shadings represent the \(95\%\)-confidence intervals computed by the bootstrapping [5] and bias-corrected and accelerated [6] methods

Fig. 6
figure 6

Results: regular shape vs. standard shape based on the Wilcoxon rank sum test (\(\alpha = 0.05\)) for \(n = 15\). The color of each square shows a statistically significantly better shape. Gray indicates that the performance difference is not statistically significant. The horizontal and vertical axes denote the volume of the initial simplex and BBOB function number, respectively

First, we discuss the effects of the initial simplex volume. Figure 5 shows a subset of the experimental results (\(n = 15\)) of the effect of the initial simplex volume. We focus on the results shown in Fig. 5 because the results for the remaining problems and dimensions share similar trends to them. All the experimental results are available in the Supplementary Material. Five problems (#1 Sphere, #6 Attractive Sector, #10 Ellipsoidal, #15 Rastrigin, and #20 Schwefel) are selected as representatives of each group. We nearly consistently confirm that a larger initial simplex volume results in a better search performance regardless of the benchmark function, shape of the initial simplex, and constraint handling method that was employed. Therefore, regarding our research question Q.1, we conclude that a larger initial simplex is preferable, and the previous results obtained by Wessing [26] can be generalized to a wide range of problems for the limited-evaluation-budget case.

We next discuss the effects of the initial simplex shape. We performed the Wilcoxon rank sum test [15] (\(\alpha = 0.05\)) to evaluate which shape, regular or standard, is more preferable. Figure 6 shows the results of the statistical tests for \(n = 15\). The results for \(n = 5\) and 10 were similar to that for \(n = 15\) and are available in the Supplementary Material. We find that, in many cases, a regular shape is statistically significantly better than a standard shape regardless of the benchmark function, the volume of the initial simplex, and the constraint handling method that was employed. In particular, this tendency becomes more apparent for unimodal functions and in higher dimensions. This result indicates that the performance of the NM method highly depends not only on the size of the initial simplex but also on its shape. In the end, regarding our research question Q.2, we conclude that a regular initial simplex is preferable for the NM method.

Finally, we discuss the effect of the constraint handling method. As we have observed in Figs. 5 and 6, the tendencies of the effects of the initial simplex volume and shape are nearly consistent, regardless of the constraint handling method that was employed. Therefore, regarding our research question Q.3, we conclude that the proper initial simplex for the NM method is not significantly dependent on the constraint handling method that is employed.

Based on the above discussion, we present a practical initialization heuristic for the NM method for a limited-evaluation-budget case. To maximize the search performance of the NM method, we should employ the initial simplex satisfying the following conditions:

  • The size of the initial simplex is as large as possible in the normalized search space.

  • The shape of the initial simplex is regular.

We consider this to be the current empirical best practice for practitioners.

4 Conclusion

In this study, we have empirically investigated the effect of the initialization on the NM method for a limited-evaluation-budget case. Our experimental results demonstrated that both the initial simplex size and shape significantly affect the performance of the NM method. We also determined the best practice for the initialization based on the preferable conditions, which practice is not seriously dependent on the constraint handling methods, as indicated by the numerical results.

A possible future direction of this study is to find a practical initialization method for multi- and re-starting cases [11]. In these cases, it may be necessary to generate a variety of simplices rather than a set of regular-shaped large simplices in order to achieve a preferable performance.

We believe that our findings will help practitioners to address real-world problems more efficiently and effectively than previously possible.