1 Introduction

Many problems in science and engineering can be state as derivation-free optimization problems [5, 18], such as decision-making [42], engineering design [43], molecular biology [8], system and database design [19], power generation [1], surgery [44], and astronomy [4]. In this paper, we focus on the global optimization problem in the following form:

$$\begin{aligned} \min _{\varvec{x}\in {\mathcal {X}}}f(\varvec{x}), \end{aligned}$$
(1.1)

where \({\mathcal {X}}=[\varvec{a},\varvec{b}]=\{\varvec{x}\in {\mathbb {R}}^d:a_i\le x_i\le b_i,i=1,2,\dots ,d\}\) is the region of interest and \(f:{\mathbb {R}}^d\rightarrow {\mathbb {R}}\) is a deterministic function whose derivatives are neither symbolically nor numerically available. There are two classes of algorithms to solve problem (1.1): the statistic heuristic algorithms, such as genetic algorithm [11], simulated annealing algorithm [17], and particle swarm algorithm [16]; the deterministic algorithms, such as partition-based method [14] and space-filling-curve-based methods [35]. In general, the statistic heuristic algorithms are flexible, but the quality of solution cannot be guaranteed. By contrast, the deterministic algorithms can provide general methods to obtain a global or approximately global optimum.

[15] and [13] proposed a well-known deterministic algorithm, called DIRECT (dividing rectangles). The DIRECT algorithm partitions \({\mathcal {X}}\) into several hyper-rectangles \({\mathcal {R}}_i\), and evaluates \(f(\varvec{x}_i)\), where \(\varvec{x}_i\) is the center of \({\mathcal {R}}_i\). In each iteration, DIRECT selects some potential optimal regions (POR) and partitions them into smaller ones. The simplicity and efficiency of the DIRECT algorithm attracted considerable interest from the optimization community [39]. Since the original algorithm was presented, many scholars have modified or extended DIRECT in various ways. In this paper, we summarize only part of these DIRECT-type algorithms in the following 3 aspects, and a more detailed review can be found in the recent article [14]. First, some studies have focused on different methods for partitioning regions [20, 29, 31, 34]. The partition methods are the focus of this paper, and will be address them in detail in Sect. 2. Second, some articles have focused on modifying the POR selection scheme [9, 21, 22, 24, 25, 30, 32, 34, 41]. They set the algorithm to search more locally or globally or switch the search state regularly by redefining the set of POR. Third, hybrid methods have been proposed to accelerate the DIRECT convergence [20, 23, 32]. These researchers combined DIRECT with local optimizer or meta-model technique to overcome DIRECT’s low efficiency when faced with high dimensionality or accuracy.

In this paper, we propose the SCABALL (scattering balls) algorithm, a new DIRECT-type algorithm with a novel partition method. The remainder of this article is organized as follows. Section 2 summarizes the schemes of DIRECT and DIRECT-type algorithms with different partitions. Then we explain the idea, process, and property of SCABALL in Sect. 3. In Sect. 4, we elaborate on the implementation and contraction of partitions in SCABALL. Section 5 displays some numerical experiments, including parameter tuning and comparison results. Finally, the main conclusions and further discussions are in Sect. 6.

2 Scheme of DIRECT-type algorithm

We summarize the main procedure for the DIRECT-type algorithms in Fig. 1. The Partition, Sample and Evaluate steps may be mixed up in some articles, we separate them to clearly compare different algorithms. It is worth noting that the Select and Partition of PORs depend on the Sample and Evaluate steps in most DIRECT-type algorithms, thus the first sampling and evaluation are included in the Initialization.

Fig. 1
figure 1

Flow chart of DIRECT-type algorithms

The key idea of DIRECT is the selection of POR, which is based on \(f({\mathcal {R}}_i)\) and \(d({\mathcal {R}}_i)\), where

$$\begin{aligned} f({\mathcal {R}}_i)\triangleq f(\varvec{x}_i), \quad d({\mathcal {R}}_i)\triangleq \sup _{\varvec{x}\in {\mathcal {R}}_i}\Vert \varvec{x}_i-\varvec{x}\Vert , \end{aligned}$$
(2.1)

\(\Vert \cdot \Vert \) denotes the Euclidean norm. Assume f is Lipschitz continuous:

$$\begin{aligned} |f(\varvec{x}_1)-f(\varvec{x}_2)|\le K\Vert \varvec{x}_1-\varvec{x}_2\Vert ,\quad \forall \varvec{x}_1,\varvec{x}_2\in {\mathcal {X}}, \end{aligned}$$
(2.2)

where \(K\ge 0\) is the Lipschitz constant. Then we get a lower bound of f in \({\mathcal {R}}_i\):

$$\begin{aligned} f(\varvec{x})\ge f({\mathcal {R}}_i)-K\cdot d({\mathcal {R}}_i),\quad \forall \varvec{x}\in {\mathcal {R}}_i. \end{aligned}$$
(2.3)

Since we cannot use the derivatives of f, the Lipschitz constant K is assumed to be unknown. Define the POR as those \({\mathcal {R}}_i\) whose lower bound is minimized under some \({\hat{K}}\). That is to say, \({\mathcal {R}}_i\) is a POR if there exists a constant \({\hat{K}}\) so that

$$\begin{aligned} f({\mathcal {R}}_i)-{\hat{K}}\cdot d({\mathcal {R}}_i)\le f({\mathcal {R}}_j)-{\hat{K}}\cdot d({\mathcal {R}}_j),\quad \forall j\ne i. \end{aligned}$$
(2.4)

To avoid the algorithm focusing on a trivial solution, [15] added another condition:

$$\begin{aligned} f({\mathcal {R}}_i)-{\hat{K}}\cdot d({\mathcal {R}}_i)\le f_{\min }-\epsilon |f_{\min }|, \end{aligned}$$
(2.5)

where \(f_{\min }\) is the minimum evaluation so far and \(\epsilon \) is a balance parameter. The benefit of Eq. (2.1) is that both \(d({\mathcal {R}}_i)\) and \(f(\varvec{x}_i)\) can be taken as the properties of \({\mathcal {R}}_i\). Then every \({\mathcal {R}}_i\) can be represented by a dot in two-dimensional diagram (see Fig. 2). The horizontal and vertical axes represent \(d({\mathcal {R}})\) and \(f({\mathcal {R}})\), respectively. Note that there is an extra dot at \((0,f_{\min }-\epsilon |f_{\min }|)\) that does not represent any region but is determined by condition (2.5). The potential PORs satisfying conditions (2.4) and (2.5) are those on the lower right area of the convex hull of the dot cloud. The upper right \({\mathcal {R}}_i\) have larger size, which tends towards global exploration, while the lower left \({\mathcal {R}}_i\) have better responses, which tends towards local exploitation. This scheme is an effective trade-off between global exploration and local exploitation. Notably the \({\mathcal {R}}_i\) with the best evaluation is always selected [9]. As the iterations approach to infinity, the \(\max _id({\mathcal {R}}_i)\) must approach zero. As a result, the points sampled by DIRECT will be “everywhere dense” [14].

Fig. 2
figure 2

Identifying potentially optimal hyper-rectangles in DIRECT

In this paper, we focus on the following five variants with different partitions: the revised DIRECT [13], adaptive diagonal curves (ADC) [34], dividing simplices at vertices (DISIMPL-V), dividing simplices at centers (DISIMPL-C) [29, 30], bisecting rectangles (BIRECT) [31, 32]. DIRECT and ADC both divide the region of interest into three hyper-rectangles, then DIRECT evaluates f at the center of the hyper-rectangles, while ADC does it at two vertices of the main diagonal. DISIMPL-V and DISIMPL-C divide the region of interest into simplices, then evaluate f at the vertices and centroid of each simplex respectively. BIRECT divides the region of interest into two rectangles, then evaluates f at the diagonal trines. Although the geometric shapes of \({\mathcal {R}}_i\) in DIRECT-type algorithms are different, one thing they have in common is that they all select PORs according to \(f({\mathcal {R}}_i)\) and \(d({\mathcal {R}}_i)\). We summarize the definitions of \(f({\mathcal {R}}_i)\) and \(d({\mathcal {R}}_i)\) as follows. For the definition of \(f({\mathcal {R}}_i)\), DIRECT and DISIMPL-C allocate only one point \(\varvec{x}_i\) in each \({\mathcal {R}}_i\); therefore, \(f({\mathcal {R}}_i)=f(\varvec{x}_i)\). Conversely, there are multiple points in each \({\mathcal {R}}_i\) when either ADC, DISIMPL-V, or BIRECT is used. ADC defines \(f({\mathcal {R}}_i)=[f(\varvec{v}_{i1})+f(\varvec{v}_{i2})]/2\) where \(\varvec{v}_{i1}\) and \(\varvec{v}_{i2}\) are the vertices of the main diagonal of \({\mathcal {R}}_i\). DISIMPL-V and BIRECT define \(f({\mathcal {R}}_i)=\min _{\varvec{x}_j\in {\mathcal {R}}_i}f(\varvec{x}_j)\) to obtain the lower bound of f. For the definition of \(d({\mathcal {R}}_i)\), DIRECT and DISIMPL-C share a similar definition as Eq. (2.1). ADC defines \(d({\mathcal {R}}_i)=\Vert \varvec{v}_{i1}-\varvec{v}_{i2}\Vert /2\), which is virtually identical to DIRECT and DIRECT. BIRECT defines \(d({\mathcal {R}}_i)=2\Vert \varvec{v}_{i1}-\varvec{v}_{i2}\Vert /3\), and DISIMPL-V defines \(d({\mathcal {R}}_i)\) as the largest side length of \({\mathcal {R}}_i\). All these DIRECT-type algorithms follow the scheme in Fig. 1. For clarity, the main steps of these algorithms are included in columns 2–6 of Table 1. The numbers in parentheses in the Initialize row represent the quantity of initial evaluations, and the numbers in parentheses in the Evaluate row represent the quantity of evaluations in each POR. The demonstrations of the 4-th iteration of DIRECT-type algorithms are shown in Fig. 3a–e, where the contour corresponds to the Branin function [38]. The regions bordered in red represent the selected PORs, the red dots denote the new sampled points, the red dash lines display the partitions of sub-regions. In this article, we focus on the influence of different partitions on the algorithm. Therefore, we selected the PORs uniformly using Eq. (2.4) and (2.5) in the above algorithms.

Fig. 3
figure 3

Demonstrations of different DIRECT-type algorithms at 4-th iteration

Table 1 Main steps of DIRECT-type algorithms
Fig. 4
figure 4

Examples of the Voronoi tessellation and ball coverage

3 SCABALL algorithm

Let us start with the basic idea of SCABALL. Note that all existing partition methods introduced in Sect. 2 follow the two principles [34]:

$$\begin{aligned} {\mathcal {X}}=\cup _{i=1}^n{\mathcal {R}}_i,\quad {\mathcal {R}}_i\cap {\mathcal {R}}_j=\partial {\mathcal {R}}_i\cap \partial {\mathcal {R}}_j,\ i\ne j, \end{aligned}$$
(3.1)

where \(\partial \) denotes the boundary. Namely, all \({\mathcal {R}}_i\) form \({\mathcal {X}}\) exactly, and different \({\mathcal {R}}_i\) intersect only at the boundary. SCABALL is a new partition method based on the Voronoi tessellation of sampled points. The Voronoi tessellation divides \({\mathcal {X}}\) into sub-regions \({\mathcal {R}}_i\), where \({\mathcal {R}}_i\) is consisting of points closer to \(\varvec{x}_i\) than to any other samples. The left of Fig. 4 demonstrates the Voronoi tessellation of 5 random samples. Clearly, Voronoi tessellation follows the principles (3.1). The usage of the Voronoi tessellation of the search domain is not a new idea: this technique is often used in interpolation [2], simulation-based optimization [20], and linear constrained optimization [3]. However, due to the high computational costs involved, such methods could be suitable for low-dimensional problems only. To handle this problem, we first note that the selection of POR is not related to the shape of \({\mathcal {R}}_i\) but only to \(d({\mathcal {R}}_i)\), then we relax the principles (3.1) as

$$\begin{aligned} {\mathcal {X}}\subseteq \cup _{i=1}^n{\mathcal {R}}_i. \end{aligned}$$
(3.2)

The proposed SCABALL algorithm does not concentrate on dividing \({\mathcal {X}}\) into specific geometry, but rather scatters several balls to cover the region of interest, see the right of Fig. 4. This reduces the calculation and make it possible to solve problems with higher dimension (relatively). The complete description of the SCABALL is shown in Algorithm 1. The rest of this section will elaborate the process of SCABALL and its convergence.

figure a

3.1 Overall scheme

As a DIRECT-type algorithm, SCABALL follows the scheme summarized in Fig. 1. To clarify the process of iteration, we denote the i-th region at k-th iteration by \({\mathcal {R}}_i^{(k)}\), the set of points sampled in \({\mathcal {R}}_i^{(k)}\) by \(S_i^{(k)}\), and the index of points in \(S_i^{(k)}\) by \(I_i^{(k)}\).

In initialization, we first normalize \({\mathcal {X}}\) to a unit hyper-cube \(\bar{{\mathcal {X}}}\triangleq [\varvec{0},\varvec{1}]\). Let \({\mathcal {R}}_0^{(0)}\triangleq \bar{{\mathcal {X}}}\) and \(S_0^{(0)}\) denote the sampled points in initialization, and \({\mathcal {B}}(\varvec{x},r)\) denote the closed ball with center \(\varvec{x}\) and radius r. According to the principle (3.2),

$$\begin{aligned} {\mathcal {R}}_0^{(0)}\triangleq \bar{{\mathcal {X}}}\subseteq \bigcup _{\varvec{x}_i\in S_0^{(0)}}{\mathcal {B}}(\varvec{x}_i,r_i^{(1)}) \triangleq \bigcup _{i\in I_0^{(0)}}{\mathcal {R}}_i^{(1)}. \end{aligned}$$
(3.3)

We evaluate \(f(\varvec{x}_i)\), and define

$$\begin{aligned} f({\mathcal {R}}_i^{(k)})\triangleq f(\varvec{x}_i),\quad d({\mathcal {R}}_i^{(k)})\triangleq \sup _{\varvec{x}\in {\mathcal {R}}_i^{(k)}}\Vert \varvec{x}_i-\varvec{x}\Vert =r_i^{(k)}. \end{aligned}$$
(3.4)

With the definition (3.4), the selection of POR can be carried out according to equations (2.4) and (2.5). Assume \({\mathcal {R}}_{j^*}^{(k)}\) is one of the POR selected at k-th iteration. In the partition step, several balls are scattered to cover \({\mathcal {R}}_{j^*}^{(k)}\) iteratively, i.e.

$$\begin{aligned} {\mathcal {R}}_{j^*}^{(k)}\subseteq \bigcup _{\varvec{x}_i\in S_{j^*}^{(k)}}{\mathcal {B}}(\varvec{x}_i,r_i^{(k+1)}) \triangleq \bigcup _{i\in I_{j^*}^{(k)}}{\mathcal {R}}_i^{(k+1)}. \end{aligned}$$
(3.5)

Since \(\varvec{x}_{j^*}\) already exists in \({\mathcal {R}}_{j^*}^{(k)}\), we set \(\varvec{x}_{j^*}\in S_{j^*}^{(k)}\) to obtain better coverage. Although other \(\varvec{x}_j\) may exist in \({\mathcal {R}}_{j^*}^{(k)}\), we ignore it for simplicity. Then we sample \(S_{j^*}^{(k)}\) in \({\mathcal {R}}_{j^*}^{(k)}\), evaluate \(f(\varvec{x}_i)\) at all points in \(S_{j^*}^{(k)}\) except \(\varvec{x}_{j^*}\), and update \(f({\mathcal {R}}_i^{(k+1)})=f(\varvec{x}_i),\ d({\mathcal {R}}_i^{(k+1)})=r_i^{(k+1)},\ \forall i\in I_{j^*}^{(k)}\). For those \({\mathcal {R}}_j^{(k)}\) that are not POR, we set

$$\begin{aligned} {\mathcal {R}}_j^{(k+1)}={\mathcal {R}}_j^{(k)},\ S_j^{(k)}=\{\varvec{x}_j\},\ d({\mathcal {R}}_j^{(k+1)})=d({\mathcal {R}}_j^{(k)}). \end{aligned}$$
(3.6)

The overall scheme of SCABALL is complete. In order to compare with other algorithms, the main steps of the SCABALL are tabulated in the last column of Table 1, and the demonstration of SCABALL is shown in Fig. 3f.

3.2 Partitions in initialization and iterations

The most important part of SCABALL is the partition denoted by Eqs. (3.3) and (3.5). The \(\varvec{x}_i\) and \(r_i\) significantly influence the efficiency of the SCABALL algorithm. Thus, they are not arbitrary and need delicate design. To explain this, the lower bound of f in SCABALL can also be expressed as

$$\begin{aligned} f(\varvec{x})\ge f({\mathcal {R}}_i)-K\cdot d({\mathcal {R}}_i),\quad \forall \varvec{x}\in {\mathcal {R}}_i, \end{aligned}$$
(3.7)

which is consistent with Eq. (2.3). In order to get a larger, which is more favorable, lower bound of f in \({\mathcal {R}}_i\), a smaller \(d({\mathcal {R}}_i)\) is preferred. To achieve this, we introduced the minimax (mM) design.

Let \(X_n=\{\varvec{x}_i,i=1,2,\dots ,n\}\) be a subset with n points on a convex region \({\mathcal {R}}\subseteq \bar{{\mathcal {X}}}\), which represents a design on \({\mathcal {R}}\). The mM design is a well-known space-filling design proposed by [12]. The mM-distance criterion of a design \(X_n\) in \({\mathcal {R}}\) is defined as follows:

$$\begin{aligned} d_{\mathrm{{mM}}}(X_n)\triangleq d_{\mathrm{{mM}}}(X_n,{\mathcal {R}})=\max _{\varvec{x}\in {\mathcal {R}}}d(\varvec{x},X_n) =\max _{\varvec{x}\in {\mathcal {R}}}\min _{i=1,\dots ,n}\left\| \varvec{x}-\varvec{x}_i\right\| . \end{aligned}$$
(3.8)

\(d_{\mathrm{{mM}}}(X_n)\) corresponds to the Hausdorff distance between \(X_n\) and \({\mathcal {R}}\), and is also called the dispersion of \(X_n\) [27]. \(d_{\mathrm{{mM}}}(X_n)\) can also be described as:

$$\begin{aligned} d_{\mathrm{{mM}}}(X_n)=\inf \left\{ r\ge 0\mid {\mathcal {R}}\subseteq \cup _{i=1}^n{\mathcal {B}}(\varvec{x_i},r) \right\} . \end{aligned}$$
(3.9)

This means that the balls centered on \(\varvec{x_i}\) with same radius \(d_{\mathrm{{mM}}}(X_n)\) cover \({\mathcal {R}}\) exactly. \(X_n^*\) is called the mM design on \({\mathcal {R}}\) if

$$\begin{aligned} d_{\mathrm{{mM}}}(X_n^*)=\min _{X_n\in {\mathcal {R}}^n}d_{\mathrm{{mM}}}(X_n). \end{aligned}$$
(3.10)

Namely, the mM design covers the region of interest with equal and minimum radii, which is a suitable partition for SCABALL.

Note that there are two differences between partitions (3.3) and (3.5): one is the geometry, and the other is the structure. First, \({\mathcal {R}}_{1}^{(0)}\) is a hyper-cube, but all \({\mathcal {R}}_{j^*}^{(k)}\) are hyper-balls. Second, \(S_0^{(0)}\) is completely free in \({\mathcal {R}}_{1}^{(0)}\), but \(S_{j^*}^{(k)}\) should take into account the relationship with \(\varvec{x}_{j^*}\). Because of these differences, we need two kinds of mM design. One is the initial design \(X_{n_{\textrm{ini}}}^{\textrm{ini}}\), which is freely designed on \({\mathcal {R}}^{\textrm{ini}}\triangleq \bar{{\mathcal {X}}}\). The other is the sequential design \(X_{n_{\textrm{seq}}}^{\textrm{seq}}\), which is designed on \({\mathcal {R}}^\textrm{seq}\triangleq {\mathcal {B}}(\varvec{1}/2,1/2)\subset \bar{{\mathcal {X}}}\) and with one point fixed at \(\varvec{1}/2\). The \(n_{\textrm{ini}}\) and \(n_\textrm{seq}\) on the subscript indicate the quantity of points in the design, and the ini and seq on the superscript indicate the geometry and structure of the design. Figure 5 demonstrates the examples of \(X_{10}^{\textrm{ini}}\) and \(X_{8}^{\textrm{seq}}\) in 2 dimensions. The construction of mM design will be detailed in Sect. 4.1, we next explain how to sample and update by mM design.

Fig. 5
figure 5

Examples of the mM design in 2 dimensions

3.3 Sampling and updating

For the convenience, we define a linear mapping \(T:[\varvec{0},\varvec{1}]\mapsto [\varvec{a},\varvec{b}]\) as follows:

$$\begin{aligned} T(\varvec{x})\triangleq T(\varvec{x};\varvec{a},\varvec{b})=\varvec{a}+\varvec{x}\circ (\varvec{b}-\varvec{a}), \end{aligned}$$

where \(\circ \) denotes the Hadamard product and \(T(X_n)\triangleq \{T(\varvec{x}_1),T(\varvec{x}_2),\dots ,T(\varvec{x}_n)\}\). We only design \(X_{n_{\textrm{ini}}}^{\textrm{ini}}\) and \(X_{n_{\textrm{seq}}}^\textrm{seq}\) once beforehand, then all sample points \(S_{i}^{(k)}\) can be obtained by T with proper parameters. Since the mM designs are response-free, the constructed mM designs can be read in preparation (see Algorithm 1 Line 1). Note that \({\mathcal {R}}_{1}^{(0)}=\bar{{\mathcal {X}}}\), we get samples as follows:

$$\begin{aligned} S_0^{(0)}=X_{n_{\textrm{ini}}}^{\textrm{ini}},\quad S_{j^*}^{(k)}=T(X_{n_{\textrm{seq}}}^{\textrm{seq}};\varvec{a}_{j^*}^{(k)},\varvec{b}_{j^*}^{(k)}), \end{aligned}$$
(3.11)

where

$$\begin{aligned} \varvec{a}_{j^*}^{(k)}=\varvec{x}_{j^*}^{(k)}-\varvec{1}\cdot d({\mathcal {R}}_{j^*}^{(k)})/2,\quad \varvec{b}_{j^*}^{(k)}=\varvec{x}_{j^*}^{(k)}+\varvec{1}\cdot d({\mathcal {R}}_{j^*}^{(k)})/2. \end{aligned}$$
(3.12)

Further more, \(d({\mathcal {R}}_i^{(k)})\) can be easily updated due to the property of \(d_{\textrm{mM}}\). According to Eq. (3.9), all sub-regions in mM design have the same radius \(d_{\textrm{mM}}\). Since \(\bar{{\mathcal {X}}}=[\varvec{0},\varvec{1}]\), \(d_{\textrm{mM}}\) can also be viewed as half the size ratio of sub-regions to original regions. Let \(d_{\textrm{mM}}^{\textrm{ini}}\triangleq d_{\mathrm{{mM}}}(X_{n_{\textrm{ini}}}^\textrm{ini})\) and \(d_{\textrm{mM}}^{\textrm{seq}}\triangleq d_{\mathrm{{mM}}}(X_{n_\textrm{seq}}^{\textrm{seq}})\), we have

$$\begin{aligned} \begin{aligned} d({\mathcal {R}}_i^{(1)})=r_i^{(1)}&=d_{\textrm{mM}}^{\textrm{ini}},\ \forall i\in I_0^{(0)},\\ d({\mathcal {R}}_i^{(k+1)})=r_i^{(k+1)}&=2d_{\textrm{mM}}^{\textrm{seq}}\cdot d({\mathcal {R}}_{j^*}^{(k)}),\ \forall i\in I_{j^*}^{(k)}. \end{aligned} \end{aligned}$$
(3.13)

In this way, for any region at k-th iteration, \(d({\mathcal {R}}_i^{(k)})\) has the form of

$$\begin{aligned} d({\mathcal {R}}_i^{(k)})=d_{\textrm{mM}}^{\textrm{ini}}(2d_{\textrm{mM}}^{\textrm{seq}})^{k^-}, \end{aligned}$$
(3.14)

where \(k^-\le k\) indicates the number of times that \({\mathcal {R}}_i^{(k)}\) was partitioned.

The initialization and first two iterations of SCABALL are illustrated in Fig. 6. The red solid lines denote the selected PORs in current stage, which is hyper-cube in initialization and hyper-ball in iterations. The red dash lines denote the sub-regions partitioned by SCABALL. The red dots denote the new sampled points in POR. It is worth noting that there could be some sampled points \(\varvec{x}_i\notin \bar{{\mathcal {X}}}\), see the lower right corner of Iteration 2 in Fig. 6. In this case, we choose \(\varvec{x}_i'=\arg \min _{\varvec{x}\in \bar{{\mathcal {X}}}}\Vert \varvec{x}_i-\varvec{x}\Vert \) to replace \(\varvec{x}_i\). This replacement is detailed in the last graph of Fig. 6, the black star point and black dash line denote \(\varvec{x}_i'\) and \({\mathcal {R}}_i'\) respectively, where \({\mathcal {R}}_i'={\mathcal {B}}(\varvec{x}_i',r_i)\). It is easy to prove that \({\mathcal {R}}_i\cap \bar{{\mathcal {X}}}\subseteq {\mathcal {R}}_i'\cap \bar{{\mathcal {X}}}\), if \(\bar{{\mathcal {X}}}\) is convex and closed. Thus, this replacement does not break the coverage in Eq. (3.18), which is important to the following proof of convergence.

Fig. 6
figure 6

Illustration of the partition method of SCABALL in 2 dimensions

3.4 Global convergence

Let \(S^{(0)}\triangleq S_0^{(0)}\), \(I^{(k)}\) denotes the index of the points in \(S^{(k)}\), where \(S^{(k)}=\bigcup _{i\in I^{(k-1)}}S_i^{(k)}\) denotes the set of all points sampled by SCABALL after k iterations. Clearly, \(I^{(k)}=\bigcup _{i\in I^{(k-1)}}I_i^{(k)}.\) Since the selection of POR is similar to DIRECT, SCABALL always selects the largest region with the best evaluation. This guarantees the density of \(S^{(k)}\), and further guarantees the global convergence of SCABALL under the assumption of continuity of f. We formally state this property as a theorem.

Theorem 3.1

If \(d_{\textrm{mM}}^{\textrm{seq}}<1/2\), then \(\forall \varvec{x} \in \bar{{\mathcal {X}}}\)

$$\begin{aligned} d(\varvec{x},S^{(k)})=\min _{\varvec{x}_i\in S^{(k)}}d(\varvec{x},\varvec{x}_i)\rightarrow 0,\quad as \quad k\rightarrow \infty . \end{aligned}$$

Proof

According to Eq. (3.3)

$$\begin{aligned} \bar{{\mathcal {X}}}\subseteq \bigcup _{i\in I^{(0)}}{\mathcal {R}}_i^{(1)}. \end{aligned}$$
(3.15)

From Eqs. (3.5) and (3.6), we have

$$\begin{aligned} {\mathcal {R}}_{j}^{(k)}\subseteq \bigcup _{i\in I_{j}^{(k)}}{\mathcal {R}}_i^{(k+1)},\ \forall j\in I^{(k-1)}, \end{aligned}$$
(3.16)

which indicates that

$$\begin{aligned} \bigcup _{j\in I^{(k-1)}}{\mathcal {R}}_{j}^{(k)}\subseteq \bigcup _{i\in I^{(k)}}{\mathcal {R}}_i^{(k+1)}. \end{aligned}$$
(3.17)

By recursion, we obtain

$$\begin{aligned} \bar{{\mathcal {X}}}\subseteq \bigcup _{i\in I^{(k-1)}}{\mathcal {R}}_i^{(k)},\ \forall k\ge 1. \end{aligned}$$
(3.18)

Let \(d^{(k)}=\max _{i\in I^{(k-1)}}d({\mathcal {R}}_i^{(k)})\) denote the largest radius of \({\mathcal {R}}_i^{(k)}\) at the k-th iteration, then

$$\begin{aligned} d(\varvec{x},S^{(k)})\le d^{(k)},\ \forall \varvec{x}\in \bar{{\mathcal {X}}}, k\ge 1. \end{aligned}$$
(3.19)

Let \(I_{\textrm{max}}^{(k)}=\{i|d({\mathcal {R}}_i^{(k)})=d^{(k)}\}\) denote the index of the largest region at k-th iteration, and \(I_*^{(k)}=\{i\in I_\textrm{max}^{(k)}|\arg \min _if({\mathcal {R}}_i^{(k)})\}\). For those \({\mathcal {R}}_i^{(k)},i\in I_*^{(k)}\), there is always \({\hat{K}}\) large enough that conditions (2.4) and (2.5) are satisfied, which means they are POR at the k-th iteration. According to Eq. (3.13), the POR will be partitioned into smaller regions at a rate of \(2d_\textrm{mM}^{\textrm{seq}}<1\). Since \(I_{\textrm{max}}^{(k)}\) is finite, \(d^{(k)}\rightarrow 0\) as \(k\rightarrow \infty \), which completes the proof. \(\square \)

For the sequential design \(X_1^{\textrm{seq}}=\{\varvec{1}/2\}\) on \({\mathcal {B}}(\varvec{1}/2,1/2)\), we have \(d_{\textrm{mM}}(X_1^\textrm{seq})=1/2\). Since \(d_{\textrm{mM}}(\cdot )\) is a non-decreasing set function, \(d_{\textrm{mM}}(X_{n_{\textrm{seq}}}^{\textrm{seq}})\le 1/2\). Therefore, the condition \(d_{\textrm{mM}}^{\textrm{seq}}<1/2\) in Theorem 3.1 is not difficult to achieve if \(n_{\textrm{seq}}\) is large enough.

4 Implementation and contraction of partition

4.1 Construction of mM design

The theoretical mM design \(X_n^*\) in Eq. (3.10) is extremely difficult to construct, because evaluating \(d_{\mathrm{{mM}}}(X_n)\) in Eq. (3.8) requires maximizing \(d(\varvec{x},X_n)\) with respect to \(\varvec{x}\in {\mathcal {R}}\). There is a geometric method based on Voronoi tessellation to evaluate \(d_{\mathrm{{mM}}}(X_n)\) [6, 7], and a corresponding algorithm to obtain \(X_n^*\) [33]. The mM designs in Fig. 5 were constructed by the geometric algorithm. While this kind of method requires calculating the Voronoi tessellation and its Chebyshev center at each iteration, which is computationally expensive and difficult to apply to the situation when \({\mathcal {R}}\) is a hyper-ball. In this article, we introduce a fast, approximate, and random method to obtain \(X_{n_{\textrm{ini}}}^\textrm{ini}\) and \(X_{n_{\textrm{seq}}}^{\textrm{seq}}\).

A fully-sequential space-filling design in [37] was constructed iteratively by greedily maximizing \(D_\beta (\varvec{x}, X_k)\), where \(D_\beta (\varvec{x}, X_k)\) is defined by

$$\begin{aligned} D_\beta (\varvec{x}, X_k)\triangleq \min \{d(\varvec{x}, X_k),\beta \cdot d(\varvec{x}, \partial {\mathcal {R}})\},\quad \varvec{x}\in {\mathcal {R}}. \end{aligned}$$
(4.1)

\(d(\varvec{x}, X_k)\) in \(D_\beta (\varvec{x}, X_k)\) corresponds to the coffee-house design criterion in [26], and \(\beta \cdot d(\varvec{x}, \partial {\mathcal {R}})\) reflects the phobia of boundary. The design constructed using the above method is denoted as the boundary-phobic coffee-house (BPCH) design.

figure b

The pseudo-code of BPCH design are summarized in Algorithm 2. \(X^{\textrm{ori}}\) is the original design needed to start the algorithm, \(n_{\textrm{max}}\) is large enough to obtain a desired design, and \({\mathcal {R}}_N\) is a subset of N point in \({\mathcal {R}}\), with \(N\gg n_{\textrm{max}}\), which is well spread over \({\mathcal {R}}\). In this paper, we set \(\beta =2\sqrt{2d}\) (chosen in [37] by trial and error), \(d(\varvec{x},\partial {{\mathcal {R}}^{\textrm{ini}}})=\min \{\Vert \varvec{x}\Vert _\infty ,\Vert \varvec{1}-\varvec{x}\Vert _\infty \}\), and \(d(\varvec{x},\partial {\mathcal {R}}^{\textrm{seq}})=1/2-\Vert \varvec{x}-\varvec{1}/2\Vert \), where \(\Vert \cdot \Vert _\infty \) denotes the infinite norm. \(X^{\textrm{ori}}\) is set to be \(\{\varvec{1}/2\}\) in the initial and sequential designs. In addition, for the initial design, \(X^{\textrm{ori}}\) can be set as some user-defined starting points, which is not permitted in most DIRECT-type algorithms [14]. Because the \(\max _{\varvec{x}\in {\mathcal {R}}}d(\varvec{x},X_n)\) in Eq. (3.8) is always obtained at \(\varvec{x}\in \partial {\mathcal {R}}\), we set \({\mathcal {R}}_N={\mathcal {R}}_{\textrm{U}}\cup {\mathcal {R}}_{\textrm{C}}\) to better represent \({\mathcal {R}}\), where \({\mathcal {R}}_{\textrm{U}}\) is uniformly sampled in \({\mathcal {R}}\) and \({\mathcal {R}}_{\textrm{C}}\) is the complementary points on \(\partial {\mathcal {R}}\). For uniformity of \({\mathcal {R}}_N\), \({\mathcal {R}}_{\textrm{C}}\) should be much less than \({\mathcal {R}}_{\textrm{U}}\). In this paper, we set \({\mathcal {R}}^\textrm{ini}_{\textrm{C}}\) as the \(2^d\) vertices of \({\mathcal {R}}^{\textrm{ini}}\) and \({\mathcal {R}}^{\textrm{seq}}_{\textrm{C}}\) as \(10^4\) uniform samples on the hyper-sphere \(\partial {\mathcal {R}}^{\textrm{seq}}\), set \({\mathcal {R}}^\textrm{ini}_{\textrm{U}}\) and \({\mathcal {R}}^{\textrm{seq}}_{\textrm{U}}\) as \(10^5\) uniform samples in \({\mathcal {R}}^{\textrm{ini}}\) and \({\mathcal {R}}^{\textrm{seq}}\), respectively.

Figure 7 provides the two-dimensional examples of \(X_{n_{\textrm{ini}}}^{\textrm{ini}}\) and \(d_{\textrm{mM}}^{\textrm{ini}}\) in sequence, and Fig. 8 shows the counterparts of \(X_{n_{\textrm{seq}}}^{\textrm{seq}}\). Compared with Fig. 5, the BPCH designs have larger \(d_{\textrm{mM}}\) than the theoretical ones under the condition of same number of points. Nevertheless, the BPCH design has many advantages. First, it generates acceptable designs in relatively short periods, even in high dimensions. Second, setting \(X^{\textrm{ori}}\) provides the flexibility to arrange points. Third, the nested structure (\(X_{n_1}^{\textrm{BPCH}}\subseteq X_{n_2}^{\textrm{BPCH}},n_1\le n_2\)) permits us to choose a suitable number for the prefix of \(X_{n_\textrm{max}}^{\textrm{BPCH}}\). We next give some experiential guidance for choosing \(n_{\textrm{ini}}\) and \(n_{\textrm{seq}}\).

Fig. 7
figure 7

Examples of initial designs in 2 dimensions

In the right panel of Fig. 7 we can see that when \(n_{\textrm{ini}}=5,9,13\), there are steep drops in \(d_{\textrm{mM}}\). We plotted \(X^{\textrm{ini}}_5\), \(X^{\textrm{ini}}_9\), and \(X^{\textrm{ini}}_{13}\) using different markers in the left panel of Fig. 7 and found that those points with specific numbers have relative symmetry. Furthermore, there are steep drops of \(d_\textrm{mM}\) from \(n_{\textrm{seq}}=5,\dots ,8\) in the right panel of Fig. 8, and \(d_{\textrm{mM}}\) remained unchanged for many integers after \(n_{\textrm{seq}}=8\). Similar things occurred at \(n_{\textrm{seq}}=13\). \(X^{\textrm{seq}}_5\), \(X^{\textrm{seq}}_8\), and \(X^\textrm{seq}_{13}\) are plotted with different markers in the left panel of Fig. 8, and we believe that \(X^\textrm{seq}_{8}\) and \(X^{\textrm{seq}}_{13}\) are more symmetrical than \(X^\textrm{seq}_{5}\). Based on the above analyses, we suggest choosing the appropriate \(n_{\textrm{ini}}\) and \(n_{\textrm{seq}}\) to meet the following two empirical conditions: (i) \(d_{\textrm{mM}}\) has a steep drop at \(n_{\textrm{ini}}(n_{\textrm{seq}})\), or (ii) \(d_{\textrm{mM}}\) is unchanged for several integers after \(n_{\textrm{ini}}(n_{\textrm{seq}})\). We will analyze the influence of \(n_{\textrm{ini}}\) and \(n_{\textrm{seq}}\) in Sect. 5.

Fig. 8
figure 8

Examples of sequential designs in 2 dimensions

4.2 Covering and overlapping rates

From the demonstration of SCABALL in Fig. 3(f), we draw the following three facts: (i) sub-regions scattered from the same region, namely \({\mathcal {R}}_i^{(k+1)},i\in I_{j^*}^{(k)}\), may overlap; (ii) sub-regions scattered from different regions, namely \({\mathcal {R}}_i^{(k)}\) and \({\mathcal {R}}_j^{(k)}\), may overlap; (iii) the union of the sub-regions exceeds the original region, namely \(\bigcup _{i\in I_{j^*}^{(k)}}{\mathcal {R}}_i^{(k+1)}\nsubseteq {\mathcal {R}}_{j^*}^{(k)}\). Due to these three facts, \({\mathcal {R}}_i^{(k+1)}\) can be contracted to obtain a more favorable lower boundary defined in Eq. (3.7). We will show that appropriate contraction can accelerate the convergence of SCABALL with little influence on the global convergence.

For a design \(X_n\) in \({\mathcal {R}}\), we define the covering rate function of r as

$$\begin{aligned} \textrm{CR}(r)\triangleq \textrm{CR}(r;X_n,{\mathcal {R}}) =\frac{\textrm{vol}(\bigcup _{\varvec{x}_i\in X_n}{\mathcal {B}}(\varvec{x}_i,r)\cap {\mathcal {R}})}{\textrm{vol}({\mathcal {R}})}. \end{aligned}$$
(4.2)

Clearly, \(\textrm{CR}(r)\) is non-decreasing with r. Define the \(\alpha \)-distance by

$$\begin{aligned} d_\alpha (X_n)\triangleq d_\alpha (X_n;{\mathcal {R}})=\textrm{CR}^{-1}(\alpha )=\inf \{r|\textrm{CR}(r)\ge \alpha \}. \end{aligned}$$
(4.3)

\(d_\alpha (X_n)\) is the minimum radius r that \(\bigcup _{\varvec{x}_i\in X_n}{\mathcal {B}}(\varvec{x}_i,r)\) covers at least \(100\alpha \%\) of \({\mathcal {R}}\), and \(d_1(X_n)=d_{\textrm{mM}}(X_n)\), obviously. We denote \(d_\alpha ^{\textrm{ini}}\triangleq d_\alpha (X_{n_{\textrm{ini}}}^{\textrm{ini}})\), \(d_\alpha ^{\textrm{seq}}\triangleq d_\alpha (X_{n_{\textrm{seq}}}^{\textrm{seq}})\), and \(d_{0.99}^{\textrm{ini}}\), \(d_{0.99}^{\textrm{seq}}\) are plotted in the right panel of Figs. 7 and 8, and they are extremely close to \(d_{\textrm{mM}}^{\textrm{ini}}\) and \(d_{\textrm{mM}}^{\textrm{seq}}\) in 2 dimensions. However, \(d_{0.99}^\textrm{ini}\)(\(d_{0.99}^{\textrm{seq}}\)) is much smaller than \(d_{\textrm{mM}}^\textrm{ini}\)(\(d_{\textrm{mM}}^{\textrm{seq}}\)) in higher dimensions (see Figs. 9 and 10). This can be interpreted as that \(d_{\textrm{mM}}\) being significantly reduced by ignoring extreme points in high dimensions. This property is also termed “do not try to cover the vertices” in [45]. Additionally, the numerical study of [28] confirmed that \(X_n^\textrm{BPCH}\) with proper \(\beta \) has strong properties when measured by \(d_\alpha \).

Fig. 9
figure 9

\(d_{\textrm{mM}}\), \(d_{0.99}\), \(d_{*}\) of \(X_{n_\textrm{ini}}^{\textrm{ini}}\), and \(X_{n_{\textrm{seq}}}^{\textrm{seq}}\) in 5 dimensions

Fig. 10
figure 10

\(d_{\textrm{mM}}\), \(d_{0.99}\), \(d_{*}\) of \(X_{n_\textrm{ini}}^{\textrm{ini}}\), and \(X_{n_{\textrm{seq}}}^{\textrm{seq}}\) in 10 dimensions

In practice, we would take the sub-region \({\mathcal {R}}_i={\mathcal {B}}(\varvec{x}_i,d_\alpha )\) for some \(\alpha <1\). On the one hand, \(d_\alpha \) is smaller than \(d_{\textrm{mM}}\), which provides a better lower bound and accelerates the algorithm. On the other hand, the global convergence in Theorem 3.1 may not hold. It is importance to choose an appropriate \(\alpha \) to balance efficiency and global convergence. To do so, we define the overlapping rate function of r for a design \(X_n\) in \({\mathcal {R}}\) using

$$\begin{aligned} \textrm{OR}(r)\triangleq \textrm{OR}(r;X_n,{\mathcal {R}}) =\frac{\textrm{vol}(\bigcup _{\varvec{x}_i,\varvec{x}_j\in X_n,\varvec{x}_j\ne \varvec{x}_i}({\mathcal {B}}(\varvec{x}_i,r)\cap {\mathcal {B}}(\varvec{x}_j,r)) \cap {\mathcal {R}})}{\textrm{vol}({\mathcal {R}})}. \end{aligned}$$
(4.4)

\(\textrm{OR}(r)\) is also non-decreasing and \(\textrm{OR}(r)\le \textrm{CR}(r)\). To obtain an appropriate contraction, we define the star-distance as

$$\begin{aligned} d_*(X_n)\triangleq d_*(X_n;{\mathcal {R}})=\inf \{r|\textrm{OR}(r)=1-\textrm{CR}(r)\}. \end{aligned}$$
(4.5)

We denoted \(d_*^{\textrm{ini}}\triangleq d_*(X_{n_{\textrm{ini}}}^{\textrm{ini}})\), \(d_*^{\textrm{seq}}\triangleq d_*(X_{n_{\textrm{seq}}}^{\textrm{seq}})\), and plotted them in Fig. 7, 8, 9, and 10. \(d_*\) corresponds to the radius such that the non-covering rate is equal to the overlapping rate. If the uncovered region happens to be replenished by the overlapped region exactly, which is impossible, \(d_*\) may be the best choice for contraction. In this sense, \(d_*\) is an underestimate of the best contracted radius. For flexibility, we introduce a parameter \(\gamma \), and replace the updating in Eq. (3.13) as

$$\begin{aligned} \begin{aligned} d({\mathcal {R}}_i^{(1)})=r_i^{(1)}&=d_*^{\textrm{ini}},\ \forall i\in I_0^{(0)},\\ d({\mathcal {R}}_i^{(k+1)})=r_i^{(k+1)}&=2\gamma d_*^{\textrm{seq}}\cdot d({\mathcal {R}}_{j^*}^{(k)}),\ \forall i\in I_{j^*}^{(k)}. \end{aligned} \end{aligned}$$
(4.6)

We will analyze the influence of \(n_{\textrm{ini}}\), \(n_{\textrm{seq}}\), and \(\gamma \) in next section.

5 Numerical results

In this section, we first present some numerical analyses of the parameters in SCABALL, then make some comparisons with other DIRECT-type algorithms. We use the GKLS-generator [10] to provide a large number of random test functions. The parameters of the GKLS-generator include (i) the problem dimension d; (ii) the global minimum value \(f^*\); (iii) the number of local minima m; (iv) the radius of the attraction region of the global minimum \(\rho ^*\); and (v) the distance from the global minimum to the quadratic function vertex \(r^*\). For more details about the GKLS-generator and related analysis, please refer to [10, 36]. For each dimension, we generated 100 multi-modal and non-differentiable functions for testing. The stopping criteria are as follows: (i) the minimum achieves the specified relative error (\(\delta =|(f_\textrm{min}-f^*)/f^*|\le 0.01\))Footnote 1; and (ii) the number of function evaluations exceeds the limit (\(N_\textrm{max}=10^4d\)). When criterion (ii) is met, we believe the problem is not solved with limited evaluations.

5.1 Choice of parameters

In order to study the influence of \(n_{\textrm{ini}}\), \(n_{\textrm{seq}}\), and \(\gamma \) on SCABALL, we divided them into three levels to perform a full factor experiment. As analyzed in Sect. 4, we set \(n_{\textrm{ini}}=5,9,13\), \(n_{\textrm{seq}}=5,8,13\), and \(\gamma =0.9,1.1,d_{\textrm{mM}}/d_*\), respectively, for \(d=2\). The level \(\gamma =d_{\textrm{mM}}/d_*\) corresponds to the uncontracted version of SCABALL, and \(d_{\textrm{mM}}/d_*>1.1\) for all the designs. The parameter \(\epsilon \) in Eq. (2.5) is a general parameter in DIRECT-type algorithms to avoid excessive refinement of the local minima. The influence of \(\epsilon \) is not the focus of our work, thus we fixed \(\epsilon =10^{-4}\) for all DIRECT-type algorithms in this paper, see [14] for more details of \(\epsilon \). For every 27 parameter combinations, we uses SCABALL to solve the 100 problems generated by GKLS with parameters

$$\begin{aligned} f^*=-1,\ m=5d,\ \rho ^*=1/3,\ r^*=2/3. \end{aligned}$$
(5.1)

The number of evaluations (\(N_{\textrm{eva}}\)) and the number of solved problems (\(N_{\textrm{sol}}\)) are recorded. We performed a profile analysis of each parameter; all the results are classified as profiles according parameter level. As a result, there are 9 profiles each containing 900 results.

For \(d=2\), we made box-plots of \(N_{\textrm{eva}}\) and line-plots of \(N_{\textrm{sol}}\) in Fig. 11(a). The outliers of \(N_{\textrm{eva}}\) are omitted for clarity. It can be seen that \(n_\textrm{ini}=13\), \(n_{\textrm{seq}}=8\) are good choices for \(d=2\), considering that they both have less \(N_{\textrm{eva}}\) and more \(N_{\textrm{sol}}\). Notably, that \(N_{\textrm{eva}}=2\times 10^4\) is enough for \(d=2\) and \(\delta =0.01\), this can be verified by the magnitude of \(N_\textrm{eva}\) in the solved problems. Therefore, the unsolved problems in SCABALL are caused by the loss of global convergence. For the choice of \(\gamma \), we should balance efficiency and global convergence. On the one hand, \(N_{\textrm{eva}}\) increases with \(\gamma \), which means that a small \(\gamma \) accelerates convergence. On the other hand, \(N_{\textrm{sol}}\) decreases significantly when \(\gamma \) is too small. To balance efficiency and global convergence, we chose \(\gamma =1.1\). Note that the \(N_{\textrm{sol}}\) of the profile \(\gamma =1.1\) is 897. This illustrates that appropriate contraction can accelerate the convergence of SCABALL with little influence on global convergence.

Fig. 11
figure 11

Profile analyses of \(n_{\textrm{ini}},n_{\textrm{seq}}\), and \(\gamma \) in SCABALL

Regrettably, mM designs do not have nested structures in different dimensions, which means we need to construct \(X_{n_{\textrm{ini}}}^\textrm{ini}(X_{n_{\textrm{seq}}}^{\textrm{seq}})\) for each dimension, and so does the analysis above. The profile analyses of other dimensions are shown in Fig. 11. We summarize some general rules as follows. The sensitivity of the SCABALL algorithm to \(n_\textrm{ini}\), \(n_{\textrm{seq}}\), and \(\gamma \) increases successively. The effect of \(n_{\textrm{ini}}\) and \(n_{\textrm{seq}}\) on \(N_{\textrm{sol}}\) is not obvious, while they have some effect on \(N_{\textrm{eva}}\). This means that the number of designs only influences the convergence rate. The best \(n_{\textrm{ini}}\) and \(n_{\textrm{seq}}\) we found grow with the dimension, and \(n_{\textrm{ini}}\) increases faster than \(n_{\textrm{seq}}\). Conversely, \(\gamma \) has significant influence on both \(N_{\textrm{eva}}\) and \(N_{\textrm{sol}}\). In general, both \(N_{\textrm{eva}}\) and \(N_{\textrm{sol}}\) increase with \(\gamma \), but too large an \(N_{\textrm{eva}}\) may lead to a decrease in \(N_{\textrm{sol}}\). This phenomenon begins to occur at \(d=5\) and higher dimensions. In high dimensions, the contraction of SCABALL is necessary to reduce the calculations. In addition, too small a \(\gamma \) makes \(N_{\textrm{eva}}\) spread more extreme, which means the algorithm is becoming unstable. Therefore, the choice of \(\gamma \) must be careful. Fortunately, \(\gamma =1\sim 1.2\) is always a good choice for a fairly wide range of dimensions. This confirms that the \(d_*\) that we defined in Eq. (4.5) is a good underestimate of the best contracted radius. Considering that the construction of mM design is stochastic, we provide the empirical formulae to choose \(n_{\textrm{ini}}\), \(n_{\textrm{seq}}\), and \(\gamma \), as follows:

$$\begin{aligned} n_{\textrm{ini}}=5d^2-7,\quad n_{\textrm{seq}}=d^2+5d-6,\quad \gamma =1.1. \end{aligned}$$
(5.2)

Since the SCABALL algorithm is not sensitive to \(n_{\textrm{ini}}\) and \(n_{\textrm{seq}}\), one can choose suitable numbers of design points near the formulae (5.2) by referring to the two suggestions at the end of Sect. 4.1. Although constructing \(X_{n_\textrm{ini}}^{\textrm{ini}}\) and \(X_{n_{\textrm{seq}}}^{\textrm{seq}}\) is time consuming, it is only implemented once, and consequently will not influence the efficiency of SCABALL.

5.2 Comparisons with original partition methods

In this subsection, we first compare the performance of SCABALL and some DIRECT-type algorithms with other kinds of partitions. The DIRECT, ADC, DISIMPL-V, DISIMPL-C, and BIRECT algorithms are under consideration; we call these as the original partition methods. The details of the original partition methods are introduced in Sect. 2, and the parameters in SCABALL refer to Eq. (5.2). We used these algorithms to solve the problems generated by GKLS with parameters in Eq. (5.1) and made the line-plots of \(N_{\textrm{eva}}\)-\(N_{\textrm{sol}}\), which is also called as the operational characteristics, in Fig. 12. The operational characteristics is a common used visual comparison of deterministic algorithms, more details can be found in Chapter 3 of [42] and [36]. All computations were performed on Intel(R) Core(TM) i7-9750 H 2.60GHz processors running Matlab R2017b. The DIRECT-type algorithms mentioned here and later are all implemented using the dynamic version of the MATLAB toolbox in DIRECTGO v1.0.0 [40].

Fig. 12
figure 12

Optional characteristics of original partition methods on simple class GKLS problems

For \(d=2\sim 5\), the efficiency of the SCABALL algorithm is superior among all other DIRECT-type algorithms with different partitions. Most random problems can be solved by SCABALL with the minimum \(N_{\textrm{eva}}\), but SCABALL is less efficient in solving extreme problems. This is reflected at the top of the \(N_{\textrm{eva}}\)-\(N_\textrm{sol}\) line, which is dragged to the right when \(N_{\textrm{sol}}\) is close to 100, and this is obvious when \(d=3,4\). It means that the SCABALL algorithm needs more evaluations than other algorithms to solve those extreme problems.

To further analyze the efficiency of SCABALL, we use GKLS to generate a new class of test functions with hard parameters:

$$\begin{aligned} f^*=-1,\ m=5d,\ \rho ^*=0.2,\ r^*=0.8. \end{aligned}$$
(5.3)

The parameters in Eq. (5.3) have smaller \(\rho ^*\) and larger \(r^*\) comparing with Eq. (5.1), which make it hard for the algorithm to locate the global minimum. For convenience, we call the functions generated by GKLS with parameters (5.1) and (5.3) as simple class and hard class respectively. We did the similar experiments with the hard class of functions, and the results are displayed in Fig. 13.

Fig. 13
figure 13

Optional characteristics of original partition methods on hard class GKLS problems

It turns out that the hard class problems need more evaluations to solve, and the SCABALL algorithm is not outstanding in solving them. We believe there are two reasons for this phenomenon. One is that we chose the parameters in SCABALL by the experiments on simple class problems, these parameters may not suitable for solving the hard ones. To achieve an outstanding efficiency, the parameters in SCABALL must be selected for a specific class of problems, otherwise, the efficiency will become mediocre. The other reason is that the DIRECT-type algorithms is greatly influenced by the groups of \(d({\mathcal {R}}_i)\). According to the updating in Eq. (3.14), the \(d({\mathcal {R}}_i)\) in SCABALL are neat. This perspective is illustrated by Fig. 14. We ran all the DIRECT-type algorithms until the number of \({\mathcal {R}}_i\) was 100, and created scatter diagrams of \(d({\mathcal {R}}_i)\) and \(f({\mathcal {R}}_i)\). It is obvious that SCABALL can decrease the number of \(d({\mathcal {R}}_i)\)-based groups of \({\mathcal {R}}_i\), and the reduction would bias the SCABALL towards a faster convergence to local minima [9]. This also explains the good performance of SCABALL on the relatively simple class problems with limited evaluations.

Fig. 14
figure 14

\(d({\mathcal {R}})\)\(f({\mathcal {R}})\) diagrams of DIRECT-type algorithms

5.3 Comparisons with improved and hybrid methods

Next, we chose some variants of DIRECT-type algorithms to analyze their efficiency in higher dimensions, including DIRECT-l [9], PLOR [25], DIRECT-rev [13], Gb-BIRECT and BIRMIN [32] algorithms. These algorithms have good performance among DIRECT-type algorithms, see the numerical comparisons in the Sect. 4.1 of [40]. The DIRECT-l, PLOR, and Gb-BIRECT algorithms change the selection of POR to make the algorithms more efficient, and we call these improved methods. The DIRECT-l decreases the \(d({\mathcal {R}}_i)\)-based groups by using the infinity norm, similar to SCABALL, it is locally biased. The Gb-BIRECT introduces a phase that constrains the selection of POR to large sub-regions, which is globally biased. The PLOR balances the global and local search by choosing only the two with minimal and maximal \(d({\mathcal {R}}_i)\). We choose these improved methods to compare the influence of locally and globally biased strategies. The DIRECT-rev and BIRMIN algorithms combine the local optimizers to accelerate convergence, and we call these hybrid methods. They use fmincon when some improvement in the best current solution is obtained. The DIRECT-rev also includes a revised partition scheme. We choose these hybrid methods to analyze the influence of the assistance of local optimizer. For better comparison, the corresponding original partition methods, DIRECT and BIRECT, are also under consideration.

The comparison results for high dimensional GKLS functions of simple class are shown in Fig. 15. As for the high dimensional GKLS functions of hard class, they are too difficult to be solved in limited evaluations. In fact, all the algorithms we selected cannot solve even half of them, thus we do not display the results. It can be seen that, none of DIRECT-type algorithms, even combined with local optimizers, are immune to the “curse of dimensionality,” they cannot solve all simple problems with limited evaluations in higher dimensions. Among all these methods, the SCABALL algorithm performs well on this class of problems. In this experiment, it is even comparable with the two hybrid methods. The DIRECT, DIRECT-rev, and BIRMIN are also in good performance, while the DIRECT-l and PLOR are the least efficient. We believe that the local strategies of DIRECT-l and PLOR are not suitable to this class of functions.

Fig. 15
figure 15

Optional characteristics of improved and hybrid methods on simple class GKLS problems

To increase the variety of test functions, we present the comparison results of DIRECT-type algorithms on the box-constrained problems in the DIRECTLib v1.3 [38]. DIRECTLib contains a large number of various objective functions, including uni-modal and multi-modal, convex and non-convex problems, and the dimensionality covers 2-10. The key characteristics of these test problems are listed in Table 2. Figure 16 displays the optional characteristics on all 129 problems; the horizontal axes are logarithmic for better illustration.

It turns out that the hybrid methods are the most efficient, especially for those difficult problems needing more evaluations. Obviously, the local optimizer greatly improved the efficiency of DIRECT-type algorithms. The influence of POR selection is also shown in Fig. 16. The DIRECT-l is a local version of DIRECT. From the general trend, the \(N_{\textrm{eva}}\)-\(N_{\textrm{sol}}\) line of DIRECT-l passes through that of DIRECT from the left. It means that the local strategy helped DIRECT-l to solve more simple problems with less evaluations. On the contrary, Gb-BIRECT is a global version of BIRECT. And the \(N_{\textrm{eva}}\)-\(N_{\textrm{sol}}\) line of Gb-BIRECT passes through that of BIRECT from the bottom. It means that the global phase helped Gb-BIRECT to solve more total problems than BIRECT. The PLOR is a balanced and simplified version of DIRECT, and it is one of the most efficient algorithm except for the hybrid methods. This results is different from the previous experiments, which indicate that the strategy of PLOR is efficient but lack of stability. Among all original partition methods and improved methods, the efficiency of SCABALL algorithm is mediocre, but it can solve the most DIRECTLib problems in limited evaluations. Due to the variety of DIRECTLib problems, this shows the robustness of SCABALL to some extent.

6 Conclusion and further discussion

In this paper, we introduced a new SCABALL algorithm for derivation-free optimization problems. SCABALL is a DIRECT-type algorithm with a new partition method. It does not focus on dividing the region of interest into specific geometry, but rather scatters several balls to cover it. Then, to achieve better coverage, approximate mM designs were constructed by boundary-phobic coffee-house design, and the partition radii were contracted by analyzing the covering and overlapping rate. The parameters in SCABALL were analyzed using a numerical method, and the empirical choices were given by formulae. Finally, we did plenty numerical experiments to compare the efficiency of SCABALL and other DIRECT-type algorithms. The numerical results show that the SCABALL algorithm is locally biased, which can solve most simple problems efficiently but is not outstanding in solving hard problems. In addition, the SCABALL algorithm is robust to some extend.

Table 2 Key characteristics of the DIRECTlib test problems

There are also weaknesses in SCABALL. One is that the performance of SCABALL is greatly influenced by the mM designs; therefore, the construction of \(X_{n_{\textrm{ini}}}^{\textrm{ini}}\) and \(X_{n_\textrm{seq}}^{\textrm{seq}}\) and the choice of \(n_{\textrm{ini}}\), \(n_{\textrm{seq}}\), and \(\gamma \) would be important. The other is that the mM designs do not have nested structures in different dimensions, one should construct two kinds of mM design for each dimension, which makes it difficult to start the SCABALL algorithm.

Fig. 16
figure 16

Optional characteristics of improved and hybrid methods on DIRECTlib problems

Further more, there are several directions left for further study. One is the improved scheme and the hybrid method of SCABALL. Since SCABALL is locally biased, the global phase could be introduced like Gb-BIRECT, and the local optimizer can also combined with SCABALL. Besides, the adaptive parameter tuning of SCABALL is also worth studying. The efficiency of SCABALL could be improved if \(\gamma \) and \(\epsilon \) can be tuned during iterations. Third, the flexibility of SCABALL needs to be exploited. As mentioned in Sect. 4.1, SCABALL can start on some user-defined points. This flexibility makes it possible to optimize the objective function with the help of priors.