Efficient unconstrained black box optimization

For the unconstrained optimization of black box functions, this paper introduces a new randomized algorithm called VRBBO. In practice, VRBBO matches the quality of other state-of-the-art algorithms for finding, in small and large dimensions, a local minimizer with reasonable accuracy. Although our theory guarantees only local minimizers our heuristic techniques turn VRBBO into an efficient global solver. In very thorough numerical experiments, we found in most cases either a global minimizer, or where this could not be checked, at least a point of similar quality to the best competitive global solvers. For smooth, everywhere defined functions, it is proved that, with probability arbitrarily close to 1, a basic version of our algorithm finds with O(nε-2)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\mathcal {O}}}(n\varepsilon ^{-2})$$\end{document} function evaluations a point whose unknown exact gradient 2-norm is below a given threshold ε>0\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varepsilon >0$$\end{document}, where n is the dimension. In the smooth convex case, this number improves to O(nlogε-1)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\mathcal {O}}}(n\log \varepsilon ^{-1})$$\end{document} and in the smooth (strongly) convex case to O(nlognε-1)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\mathcal {O}}}(n\log n\varepsilon ^{-1})$$\end{document}. This matches known recent complexity results for reaching a slightly different goal, namely the expected unknown exact gradient 2-norm is below a given threshold ε>0\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varepsilon >0$$\end{document}. Numerical results show that VRBBO is effective and robust in comparison with the state-of-the-art local and global solvers on the unconstrained CUTEst test problems of Gould et al. (Comput Optim Appl 60:545–557, 2014) for optimization and the test problems of Jamil and Yang (Int J Math Model Numer Optim 4:150, 2013) for global optimization with 2–5000 variables.


Mathematics Subject Classification Primary 90C56 1 Introduction
In this paper we consider unconstrained black box optimization (BBO) or derivativefree optimization (DFO); see, e.g., [3,12,56].The labels BBO and DFO are used in practice synonymously, though with slightly different emphasis.Algorithms for BBO/DFO repeatedly call an oracle (a black box, to which BBO refers) that returns for any given x ∈ R n a real function value f (x) uniquely determined by x, possibly also ∞ or NaN (not a number).In this way, they try, using no other information about f (such as continuity, Lipschitz constants, differentiability, or derivative information, to which DFO refers), to find a minimizer of the underlying function f .However, although no such information is used for executing the algorithms, the motivation and analysis of the algorithms always assumes, at least informally, that the function f has reasonable mathematical properties.To be able to give performance guarantees, such properties are essential.

Related work
There is a huge amount of literature on BBO problems and how to solve them.We mention only a few pointers to the literature.A thorough survey of derivative-free optimization methods was given by Larson et al. [39].Another useful paper suggested by Rios & Sahinidis [56] discusses the practical behaviour of derivative-free optimization software packages.The techniques for solving BBO problems fall into two classes, deterministic and randomized methods.We mainly discuss the randomized case; for deterministic methods see, e.g., the book by Conn et al. [12] and its many references.Randomized methods for BBO going back to Rastrigin [55], Polyak [49], and van Laarhoven & Aarts [58] were later discussed especially in the framework of evolutionary optimization [6,28,57].There are also randomized BBO optimization algorithms for deterministic problems, e.g., Bandeira et al. [7] and Holland [30].For deterministic global BBO optimization see, e.g., Hansen [27] and for stochastic global BBO optimization see, e.g., Zhigljavsky [62].Other useful BBO references are Audet & Hare [3], Moré & Wild [42], and Müller & Woodbury [43].
Previous BBO software of the optimization group at the university of Vienna includes the deterministic algorithms GRID [20,21] and MCS [31] and the randomized algorithms SnobFit [32] and VXQR [46].Software by many others is mentioned in Sect.7.3.

Known complexity results
This section discusses known complexity results in the deterministic and randomized cases.
Throughout the paper, we use a scaled 2-norm p and its dual norm g * of p, g ∈ R n , defined in terms of a positive scaling vector s ∈ R n by p := i p 2  i /s 2 i , g * := i s 2 i g 2 i . ( For the choice of a suitable scaling vector see Sect. 5.

Assumptions
For the mathematical analysis of our algorithm we assume, as customary in the literature on complexity results, that (A1) the function f is continuously differentiable on R n , and the gradient g(x) of x ∈ R n is Lipschitz continuous with Lipschitz constant L; Under these conditions, a global minimizer x of f exists and is finite.x is called an ε-approximate stationary point if holds.For fixed ε > 0, ε-approximate stationary points may also exist in regions where the graph of f is sufficiently flat, although no stationary point is nearby.If such a point is encountered, the convergence speed of optimization methods may be extremely slowed down so that a spurious apparent local minimizer is found.
If σ > 0 and the condition holds then f is called σ -strongly convex.In this case, the global optimizer x is unique, and (3) guarantees that iterations are near x.
Proof For fixed x, the right-hand side of ( 4) is a convex quadratic function of x, minimal when its gradient vanishes.By (1), this is the case iff x i takes the value x i − s i σ g i (x) for i = 1, . . ., n, so that f ≥ f (x) − 1 2σ g(x) 2 * for x ∈ R n .Therefore we conclude from (3) and (4) that for Table 1 Complexity results for randomized BBO in expectation (Bergou et al. [9] for all cases) Strongly convex Strongly convex In exact precision arithmetic, the exact gradient vanishes at a stationary point.But in finite precision arithmetic optimization methods may get stuck in nearly flat regions, so that a spurious apparent local minimizer may be found.For a finite termination a theoretical criterion should be used to get an ε-approximate stationary point.For a given threshold ε > 0, a complexity bound of an unconstrained BBO method tells how many function evaluations, N (ε), are needed to find with a given probability (or a related goal) a point x best whose function value f (x best ) is below the initial function value f (x 0 ) and the unknown gradient norm g(x best ) * at this point is below ε, i.e., f (x best ) ≤ sup{ f (x) | x ∈ R n , g(x) * ≤ ε, and f (x) ≤ f (x 0 )}. ( (3) says that, in term of function evaluations, x best is at least as good as the worst ε-approximate stationary point with the function value at most f (x 0 ).Since gradients and Lipschitz constants are unknown to us, we could not say which point satisfies (5).But the result implies that the final best point has a function value equal to or better than some point whose gradient was small.If gradients are small only nearby a global optimizer, it will produce a point close to the local optimizer.If some iterate passes close to a non-global local optimizer or a saddle point, it is possible that the algorithm escapes its neighborhood.In this case, only a variant with restarts would produce convergence to a point with a small gradient.Under the assumptions (A1)-(A2), the appropriate asymptotic form for the expression N (ε), found by Vicente [60], Dodangeh & Vicente [17], Dodangeh, Vicente & Zhang [18], Gratton et al. [24], Bergou, Gorbunov & Richtárik [9], and Nesterov & Spokoiny [44,45], depends on the properties (smooth, smooth convex, or smooth strongly convex) of f ; cf.Sect.2.1 below.
Bergou et al. [9] and Nesterov & Spokoiny [45] generalized this result to give algorithms with complexity results for the nonconvex, convex, and strongly convex cases shown in Table 1.In each case, the bounds are better by a factor of n than the best known complexity results for deterministic algorithms (by Dodangeh & Vicente [17], Vicente [60] , and Konecný & Richtárik [37]) given in Table 2. Of course, being a randomized algorithm, the performance guarantee obtained by Bergou et al. is slightly weaker, only valid in expectation.Moreover, they generated step sizes without testing whether the function value is improved or not.This is the reason why the algorithms proposed by Bergou et al. [9] are numerically poor, see Fig. 3 in Sect.7.  (Vicente [60] for the nonconvex case, Dodangeh & Vicente [17] for the convex and the strongly convex cases, Konecný & Richtárik [37] for all cases)

Case
Goal Complexity The best complexity bound for a direct search with probabilistic (rather than expectation) guarantees has been found by Gratton et al. [24], only for nonconvex case.They used Chernoff bounds to prove that a complexity bound O(n Rε −2 ) holds, R is the number of random directions (uniformly independently distributed on the unit sphere) used in each iteration, satisfying where 0 < γ 1 < 1 is a factor for reducing step sizes and γ 2 > 1 is a factor for expanding step sizes.If γ 1 = 0.5 and γ 2 = 2, then R = 2.

Our contribution
We describe and test a new, practically very efficient randomized method, called VRBBO (short for Vienna randomized black box optimization), for which good local complexity results can be proved, and which is competitive in comparison with the state-of-the-art local and global BBO solvers.An algorithm loosely related to VRBBO (but without complexity guarantees) is the Hit-and-Run algorithm of Bélisle [8].
A basic version of VRBBO.In Sect.2, an extrapolation step, called extra polationStep is discussed and then a multi-line search with random directions, called MLS-basic, is constructed, trying extrapolationStep.Sect. 3 first introduces a basic version of our fixed decrease search algorithm, called FDS-basic to hopefully get a decrease in the function value.It has repeated calls to MLS-basic until the function value is decreased.Then a basic version of VRBBO, called VRBBO-basic, is introduced, which has repeated calls to FDS-basic until an ε approximate stationary point is found; see Flowcharts (a)-(c) of Fig. 1.Complexity results for VRBBO-basic Sect. 4 discusses our complexity bound for the nonconvex case with the same order and factor as the one found by Gratton et al. [24] obtained by Chernoff bound.But with the difference that the constant factor of our bound is obtained from a result of Pinelis [48].Both complexity results are better by the factor of R/n than those given in Table 1 and more reasonable than those given in Table 2. Our complexity bounds for the convex and strongly convex cases are proven with probability arbitrarily close to 1, which are new results and are more reasonable than those given in Table 2, valid only in expectation.Table 3 summarizes our complexity results for all cases, matching Gratton et al. [24] for the nonconvex But log η −1 cannot be large for reasonable value of η.In practice, VRBBO-basic works best with a much larger value R = O(n).Heuristic techniques.We add many new useful heuristic techniques -discussed in Sect. 5 -to VRBBO-basic that make it very competitive, in particular: • Several kinds of search directions ensure good practical performance.
• Adaptive heuristic estimations for the Lipschitz constant are used.
• A sensible scaling vector is estimated.
• The gradient vector is estimated by a randomized finite difference approach.
These heuristic techniques improve the performance in practice, leading to the FDS and VRBBO implemented documentations in Sect.6.

A new line search technique
In this section, we describe a method that tries to achieve a decrease in the function value using line searches along specially chosen random directions.In our algorithm random directions are used because it is known that randomized black box optimization methods have a worst case complexity by a factor of n lower than that of deterministic algorithms (see cf. [7]).A line search then polls one or more points along the lines in each chosen direction starting at the current best point.Several such line searches are packaged together into a basic multi-line search, for which strong probabilistic results can be proved.
The details are chosen in such a way that failure to achieve the desired descent implies that, with probability arbitrarily close to one, a bound on the unknown gradient vector is obtained.

Probing a direction
Let Δ ≥ 0 be the threshold for improvements on the function value and let f (x) − f (x ± p), for every x, p ∈ R n , be the gain along ± p. First we give a theoretical test that either results in a gain of Δ or more in function value, or gives a small upper bound for the norm of at least one of the unknown gradients encountered though our algorithm never calculates ones.Assumption (A1) implies that for every x, p ∈ R n , we have where γ depends on x and p and satisfies one of Here σ comes from (4).In all three cases, Continuity and condition (A2) imply that a minimizer x exists and (It is enough that this holds with x 0 replaced by some point found during the iteration, which is then taken as x 0 ).
Proposition 2 Let x, p ∈ R n and Δ ≥ 0. Then (A1) implies that and at least one of the following holds: Proof Taking the sum of (10) and the formula obtained from it by replacing p with − p gives (12).Assume that (iii) is violated, so that Δ + 1 2 L p 2 .If g(x) T p ≤ 0, then by ( 10) If g(x) T p ≥ 0, then similarly If ( 13) holds we conclude that (i) holds.If (14) holds we get the second half of (ii), and the first half follows from Proposition 2 will play a key role in the construction of our basic multi-line search MLS-basic detailed in Sect.2.3: • It establishes the well-known (Evtushenko [22]) lower bound (12) for the Lipschitz constant L which can be used to find reasonable approximations for L. • If (i) holds, then the step p gives a gain of at least Δ, called the sufficient gain.
• If (ii) holds, then the step − p gives a sufficient gain.
• If neither (i) nor (ii) holds (no sufficient gain is found along ± p) then (iii) holds, giving a useful upper bound for the directional derivative.
In particular, this allows us to prove statements about the unknown gradient even though our algorithm never calculates one.

Random search directions
For our complexity results, we need that sufficiently many search directions p satisfy the angle condition of the form Here g is the gradient of the current best point and Δ a > 0 is a tuning parameter for the angle condition.Random directions are uniformly independent and identically distributed (i.i.d) in where rand(n, 1) generates a random vector uniformly distributed in [0, 1] n .
The following variant of the angle condition (15) plays a key role to get our complexity bounds.
Proposition 3 For random search directions generated by (16) and scaled by p := p(δ/ p ). ( satisfies p = δ and, with probability with a positive constant c ≤ 12.5.

123
Proof As defined earlier in Sect.1.2, s ∈ R n is a scaling vector.Denote by s i the ith component of s and define p i := p i /s i and g i := s i g i .Then by (1), g T p = g T p and g * = g 2 and p = p 2 ; so the results of Sect.9.1 apply after scaling and give c = c 0 /4 ≤ 12.5.This simulation result from Sect.9.1 suggests that c ≈ 4/7.

A multi-line search
In this section, we construct a multi-line search algorithm, called MLS-basic.It polls in random directions [satisfying (18), with probability ≥ 1 2 , generated by ( 16), and scaled by (17)] in a line search fashion a few objective function values each in the hope of finding sufficient gains by more than a multiple of Δ.

An extrapolation step
As discussed in Sect.1.3, the main ingredient of VRBBO-basic is FDS-basic which has repeated calls to MLS-basic until at least a sufficient gain is found.The accelerated ingredient of MLS-basic is extrapolation whose goal is to speed up reaching a minimizer by expanding step sizes and computing the corresponding trial points and their function values as long as sufficient gains are found.We discuss how to construct extrapolation steps, called extrapolationStep, trying to hopefully find sufficient gains.extrapolationStep may perform extrapolation along either the search direction p or its opposite direction.
Let {x k } k≥0 be the sequence generated by VRBBO-basic.In the kth iteration of this algorithm, FDS-basic takes as input the (k − 1)th point xm = x k−1 and its function value fm = f k−1 generated by VRBBO-basic and returns the kth point x k = xm and its function value f k = fm as output if at least a sufficient gain is found by MLS-basic; otherwise x k = x k−1 and f k = f k−1 .In fact, after the kth iteration of VRBBO-basic is performed, xm is the current trial point evaluated by extrapolationStep, obtained from a sufficient gain, accepted as a new point, and called the best point.Hence all points x k , for k = 1, 2, . .., are the best points found by extrapolationStep.The last point generated by VRBBO-basic is said the overall best point.
Care must be taken to ensure that the book-keeping needed for the evaluation of the lower bound for the Lipschitz constant comes out correctly.To ensure this during an extrapolation step, we always use xm for the best point found by extrapolationStep such that the next evaluation is always at xm + p and a former third evaluation point is at xm − p.The function values immediately after the next evaluation are then At this stage, we can compute the lower bound for the Lipschitz constant L, valid by (12).Note that the initial λ old is the tuning parameter λ max , however, it is updated by extrapolationStep and may be estimated by a heuristic formula.As defined earlier in Sect.2.1, df := fm − fr is the gain and given the tuning parameter 0 < γ min < 1, if the condition holds, a sufficient gain is found and the corresponding point is updated by overwriting xm + p over xm, with the consequence that in this case R denotes the number of the random search directions used in MLS-basic and a denotes the list of R extrapolation step sizes.All components of the initial list a are one, i.e., a(t) = 1 for t = 1, . . ., R. These components are expanded or reduced according to whether sufficient gains are found or not.Let n sg be the number of sufficient gains found by extrapolationStep to exceed sufficient gains.If the counter n sg remains zero, extrapolationStep cannot find a sufficient gain.t is a counter for R taking 0, . . ., R. It does not change inside extrapolationStep, but it is updated later outside extrapolationStep (inside MLS-basic).
We must be careful to make sure that the estimation of the Lipschitz constant is correct, especially when an extrapolation step-improving the function value-is tried.This estimation is computed (i) after an opposite direction is tried.Since there is no sufficient gain along the direction p, its opposite direction is tried.Then λ is estimated by (20) according to (19); (ii) after the first sufficient gain is found.For this estimation, fl, fm, fr are needed.Since a sufficient gain is found, according to (22), the Lipschitz constant λ is estimated by (20).
In summary, extrapolationStep first takes the initial step size α e = 1, which is necessary to approximate a lower bound for the unknown Lipschitz constant L. Then it chooses step sizes from a(t) later while expanding it until a sufficient gain is found.After a sufficient gain is found α e is saved as the new a(t).One of the following cases is happened: (i) A sufficient gain is found along the direction p. (ii) A sufficient gain is found along the direction − p. (iii) No sufficient gain is found along ± p.
If either (i) or (ii) holds, extrapolationStep is successful at least with a sufficient gain.But if (iii) holds extrapolationStep is unsuccessful without sufficient gain.

Algorithm 1 extrapolationStep, an extrapolation step
Input.The old best point xm and its function value fm, the search direction p, the threshold for good improvement Δ > 0, the old approximation λ old ≥ 0 for the Lipschitz constant L, the tth extrapolation step size a(t), the norm of trial step δ, and maximum number of function evaluations nfmax.
Output.A newest best point xm and its function value fm, a new approximation λ for the Lipschitz constant L, the number n sg of sufficient gains, and the tth extrapolation step size a(t).

A basic version of the MLS algorithm
For each random direction generated, our basic multi-line search (MLS-basic) using extrapolationStep is performed where the following happens: • A step in the current direction is tried.
• If a sufficient gain is found, a sequence of extrapolations is tried.
• If a sufficient negative gain is found, a step in the opposite direction is tried.
• If a sufficient gain is found in the opposite direction, a sequence of extrapolations is tried.• If no sufficient gain along ± p is found, the step size is reduced.
In (24), the extrapolation step sizes must be reduced whenever no sufficient gain is found.Hence, they need to be controlled by the tuning parameter α min .( 23) is motivated at the end of this section since it is based on the details of Theorem 1, which asserts that one obtains either a sufficient gain of multiple of Δ or, with probability arbitrarily close to 1, an upper bound of g * for at least one of the unknown gradients encountered though our algorithm never calculates ones.
Theorem 1 Assume that (A1) holds and let nf be the counter for the number of function evaluations, R be the number of random search directions, and let Δ = γ min Δ be the improvement on the function value in MLS-basic with 0 < γ min < 1.Here nfmax is assumed to be sufficiently large.

(i) f decreases by at least
(Note that Δ f may be zero, catering for the case of no strict decrease).(ii) Suppose that 0 < η ≤ 1  2 and R := log 2 η −1 .If f does not decrease by more than a multiple of Δ then, with probability ≥ 1 − η, the original point or one of the points evaluated with better function values has a gradient g with where c is the constant in Proposition 3 and Γ (δ) is defined by Proof (i) Clearly, the function value of the best point does not increase.Thus (i) holds if nf − 2R ≤ 0. If this is not the case, then nf ≥ 2R + 1.But in the for loop of MLS-basic, R directions p are generated and at most two function values are computed, unless an extrapolation step is performed.In the latter case, at least nf−2R additional function values are computed during the extrapolation stage, each time with a sufficient gain of at least Δ.Thus the total sufficient gain is at least (25).
(ii) Assume that f does not decrease by more than Δ.For t = 1, . . ., R, let p t be the tth random direction generated by (17), and let x t be the best point obtained before

Algorithm 2 MLS-basic, a basic multi-line search
Input.The old best point xm and its function value fm, the threshold Δ > 0 for good improvement, the old approximation λ ≥ 0 for the Lipschitz constant L, maximal number nfmax of function evaluations, and the list a of extrapolation step sizes.

Output
12: end if 13: end for searching in direction p t .Then, from Proposition 2, we get Since the random direction is generated by (16), Proposition 3 implies that holds with probability 1 2 or more.Therefore g(x t ) * ≤ √ cnΓ (δ) fails with a probability less than 1 2 for all t = 1, . . ., R. Therefore, the probability that ( 26) holds for at least one of the gradients g = g(x t ) (t ∈ {1, . . ., R}) is Note that ( 26) is guaranteed to hold although gradients are never computed.Since gradients and Lipschitz constants are unknown to us, we could not say which point satisfies (26).But the result implies that the final best point has a function value equal to or better than some point whose gradient was small.If gradients are small only nearby a global optimizer, it will produce a point close to the local optimizer.If some iterate passes close to a non-global local optimizer or a saddle point, it is possible that the algorithm escapes its neighborhood.In this case, only a variant with restarts would produce convergence to a point with a small gradient.As discussed earlier in Sect.1.3, VRBBO-basic tries to find a point satisfying (5).The goal of the scaling of the search direction ( 16) is that the bound √ cnΓ (δ) in ( 26) becomes below a given threshold ε > 0. This is done by minimizing such a bound.For fixed Δ, the scale-dependent factor ( 27) is smallest for the choice δ := 2Δ/L.Accordingly, ( 23) is used to scale the random directions (16), safeguarded by the sensible positive lower and upper bounds 0 < δ min < δ max < +∞.As can be seen from Sect.2.3.1, α e is used for estimating L and here for adjusting δ.Due to our experience, using the unfixed α e in ( 23) is useful.In fact, δ is a special case of the term √ α e γ δ Δ/λ in (23) with α e = 1 and γ δ = 2.

A randomized descent algorithm for BBO
In this section, we first consider a fixed decrease search for which an upper bound of the unknown gradient norm for at least one of the points generated by the extrapolationStep or of the total number of function evaluations is found.
Then a primary version of our algorithm is given.

Probing for fixed decrease
Based on the preceding results, we introduce the basic version of a fixed decrease search algorithm (FDS-basic) whose goal is to perform calls to the basic multi-line search MLS-basic to hopefully find sufficient gains by a multiple of Δ.If either there is no sufficient gain (n sg = 0) by MLS-basic or nfmax is reached, FDS-basic ends.The main ingredient of VRBBO-basic is FDS-basic which takes for large Δ many large steps, hence has a global character.
In the next algorithm, it is assumed that FDS-basic is tried in the kth iteration of VRBBO-basic.

Algorithm 3 FDS-basic, a basic fixed decrease search
Input.The old best point xm and its function value fm, the threshold Δ k−1 > 0 for good improvement, λ k−1 ≥ 0 for the Lipschitz constant L, maximum number nfmax of function evaluations, and the list a k−1 of extrapolation step sizes at the (k − 1)th iteration of VRBBO.
Output.A new best point xm and its function value fm, a new approximation λ k ≥ 0 for the Lipschitz constant L, and an updated list a k of extrapolation step sizes at the kth iteration of VRBBO.

2: while true do
To hopefully get sufficient gains 3: Perform: Stopping test

4:
if nfmax is reached or n sg is zero then 5: FDS-basic ends.6: end if 7: end while Theorem 2 Assume that (A1) and (A2) hold, nfmax is sufficiently large, denote by f 0 the initial value of f and let Δ = γ min Δ with the tuning parameter 0 < γ min < 1. Then: (i) The number of function evaluations of FDS-basic is bounded by where f is the global minimum value.(ii) Denote by K f the number of calls to MLS-basic by FDS-basic and assume that Then FDS-basic finds a point x, with probability Here c is the constant from Proposition 3 and, if λ 0 denotes the value of λ before the first execution of FDS-basic, Proof By (A2), f is finite.Denote by f k+1 the result of the (k + 1)th execution of FDS-basic.In the worst case in each iteration ∈ {1, . . ., k} of FDS-basic a sufficient gain is found, i.e., the condition holds.But in the (k + 1)th iteration FDS-basic cannot find any sufficient gain and ends.We then conclude that Since a sufficient gain is found in each iteration = 1, . . ., k, 2R + 1 function evaluations are used.But in the (k + 1)th iteration, 2R function evaluations are used since there is no sufficient gain.Hence (i) follows.(ii) K f is finite due to (i) and we have 2 −R ≤ η.Hence by Theorem 1 with probability holds.Thus it is sufficient to show that By the definition of δ in ( 23), we have one of the following three cases: where Case 2: δ = δ min ≥ γ δ Δ λ .In this case, Thus in each case, As discussed earlier, L is unknown and we replace it by an approximation value λ.Proposition 2 implies that where λ 0 is the initial value of λ.Now (30) follows since by (31) and (32),

A basic version of the VRBBO algorithm
We now have all ingredients to formulate VRBBO-basic.It uses in each iteration the fixed decrease search algorithm to update the best point.If either no sufficient gain is found or nfmax is not reached in the corresponding FDS-basic call, Δ is reduced by a factor of Q.Once either Δ is below a minimum threshold, VRBBO-basic stops.
As discussed above, Δ max and λ max are initially tuning parameters but in an improved version of VRBBO-basic they will be estimated by heuristic techniques in Sect. 5. From Lines 1 and 10, the kth call to FDS-basic uses It will be used in the next section to prove all theorems.

Algorithm 4 VRBBO-basic, Vienna basic randomized BBO
Input.The initial point x 0 and maximum number nfmax of function evaluations.
Output.An overall best point xm and its function value fm.Perform:

Complexity analysis of VRBBO-basic
We now prove the complexity results for the nonconvex, convex, and strongly convex objective functions.We denote by N k the total number of function evaluations used by VRBBO-basic up to iteration k.

The general (nonconvex) case
Theorem 3 Let {x k } (k = 0, 1, 2, . . . be the sequence generated by VRBBO-basic.Assume that (A1) and (A2) hold, nfmax is sufficiently large, and the parameters are given.If the parameters are chosen such that then VRBBO-basic finds after at most O(nε −2 ) function evaluations with probability ≥ 1 − η a point x with Proof We conclude from ( 33)-( 35) that Hence at most K steps of FDS-basic are performed.By (36), we have Thus by Theorem 2(ii) we have from ( 34) and ( 37), with probability ≥ 1−η 1 ≥ 1−η, for at least one of the function values encountered, From (33) and Theorem 2(i), the K th call to FDS-basic uses at most 2) and is the global minimum value.Then Choosing Δ min = O(ε 2 /n) with ( 34) is possible, and K , R, δ min can clearly be chosen to satisfy ( 35)- (37) and displays K = O(log n ε 2 ) and R = O(log η −1 ).Finally, we conclude that
Proof For ≥ K , from ( 33)-( 35), (39) holds.Hence at most K steps of FDS-basic are performed.In fact in each step a point without sufficient gain is found satisfying (28) by Theorem 2(ii).As discussed earlier after Theorem 1, these points are unknown to us since the gradients and Lipschitz constants are unknown.The index set of these points is denoted by U whose size is K .The convex case is characterized by (8), so that From (11), we have with probability ≥ 1 − η We consider the following three cases: Case 1.The second term √ L Δ in (41) dominates the others.Then for ∈ U Put Case 2. The first term δ min in (41) dominates the others.Then for ∈ U Put U 2 := { ∈ U | (43) holds}.
Case 3. The third term 2Δ δ max in (41) dominates the others.Then for ∈ U Put U 3 := { ∈ U | (44) holds}.Then we conclude from ( 33) and ( 42)-( 44) that Hence (34) and (35) result in so that we get in the same way as the proof of Theorem 2 According Theorem 3, (38) holds at least for one of evaluated points.As a result, at least for one of evaluated points (40) holds with probability ≥ 1 − η by applying ( 11) and ( 38)-( 41).

The strongly convex case
Theorem 5 Let {x k } (k = 0, 1, 2, . . . be the sequence generated by VRBBO-basic and let f be σ -strongly convex on L(x 0 ).Moreover, assume that (A1) and (A2) hold and nfmax is sufficiently large.Under the assumptions of Theorem 3, VRBBO-basic finds after at most function evaluations with probability ≥ 1 − η a point x satisfying (38), Proof For ≥ K , from ( 33)-( 35), (39) holds.Hence at most K steps of FDS-basic are performed.In fact in each step a point without sufficient gain is found satisfying (28) by Theorem 2(ii).As discussed earlier after Theorem 1, these points are unknown to us since the gradients and Lipschitz constants are unknown.The index set of these points is denoted by U whose size is K .The strongly convex case is characterized by (9), so that f has a global minimizer x and for any x and y in L(x 0 ).For fixed x, the right-hand side of this inequality is a convex quadratic function of y, minimal when its gradient vanishes.By (1), this is the case iff y i takes the value x i − s i σ g i (x) for i = 1, . . ., n, and we conclude that The replacement of x by x −1 in ( 47) and (38) gives, with probability ≥ 1 − η, We consider the following three cases: Case 1.The second term √ L Δ in (48) dominates the others.Then for ∈ U Put Case 2. The first term δ min in (48) dominates the others.Then for ∈ U Case 3. The third term 2Δ δ max in (48) dominates the others.Then for ∈ U Put U 3 := { ∈ U | (51) holds}.Then we conclude from ( 33) and ( 49)-( 51) that Hence (34) and (35) result in and we then obtain in the same way as the proof of Theorem 2 According to Theorem 3,(38) holds at least for one of evaluated points.As a result, at least for one of evaluated points (45) holds with probability ≥ 1 − η by applying (38) to (45).( 46) is obtained by applying ( 4), (45), and since g( x) = 0, i.e., x

Some new heuristic techniques
In this section, we describe several heuristic techniques that improve the basic version of Algorithm 4, VRBBO.While only convergence to a local minimizer is guaranteed, FDS together with our heuristic techniques turn VRBBO into an efficient global solver.
In fact FDS takes initially only for large Δ many large steps, hence has a global character.
More specifically, we discuss the occasional use of alternative search directions (two cumulative directions and a random subspace direction) and heuristics for estimating key parameters unspecified by the general theory -the initial desired gain, the Lipschitz constant, and the scaling vector.Moreover, we discuss how to approximate the gradient estimated by finite differences with step sizes extracted from the extrapolation steps.In Sect.6, we combine Algorithm 4 with these heuristic techniques, resulting in the global solver VRBBO.

Cumulative directions
We consider two possibilities to accumulate past directional information into a cumulative search direction: (i) With xm and fm defined in Sect.2.3.1 the first cumulative direction is model independent, computed by where x init is the initial point of the current improved version of MLS-basic.Here the idea is that many small improvement steps accumulate to a direction pointing from the starting point into a valley, so that more progress can be expected by going further into this cumulative direction.(ii) The second cumulative direction assumes a separable quadratic model of the form with quadratic univariate functions Ψ i (α) vanishing at α = 0.Here I is the set of directions polled at least twice, and p i is the corresponding direction as rescaled by an improved version of MLS-basic.
By construction, we have for any i ∈ I three function values at equispaced arguments.We write the quadratic interpolant as where Let us recall the function values fl, fm, and fr satisfying either (19) or (22).If fr < fm, the last evaluated point was the best one, so fr ≤ min(fl, fm).
In this case, ( 22) holds and we have and Otherwise, the last evaluated point was not the best one, so fm ≤ min(fl, fr).In this case, (19) holds and we compute d by and h by (55).
Given the tuning parameter a > 0, the minimizer of the quadratic interpolant restricted to the interval [−a, a] is in case h ≤ 0. Otherwise, we have Assuming the validity of the quadratic model (53), we find the model optimizer by additively accumulating the estimated steps α p and gains Ψ into a cumulative step q with anticipated gain r .

Random direction
When sufficient gains are found, the trial points are accepted as the new best points and saved in X and their function values are saved in F. Denote by m s the maximum number of points saved in X and by b the index of newest best point.Throughout the paper, A :k denotes the kth column of a matrix A. Random subspace directions point into the low-dimensional affine subspace spanned by a number of good points kept from previous iterations.They are computed by

Choosing the initial 1
First of all, we compute Then if dF is not zero we approximate the initial desired gain Δ := γ max min(dF, 1), (61) where γ max > 0 is a tuning parameter.Otherwise Δ := Δ max , where Δ max > 0 is the initial gain.

Choosing the initial
The initial value for λ is λ max which is the tuning parameter, however, it is updated by (20) provided that the best point is updated by extrapolationStep.Our achievement is to approximate it by a heuristic formula based on the previous best function values restored in F. Let λ old be the old estimation for the Lipschitz constant and γ λ > 0 be a factor for adjusting λ.We compute λ by where dF is computed by (60).

the scaling vector
The idea is to estimate a sensible scaling vector s with the goal of adjusting the search direction scaled by (17).We compute dX :i := X :i − X :b for all i = 1, . . ., m s and estimate the scaling vector Finally, the formula ( 17) is rewritten as where • denotes componentwise multiplication and δ is computed by ( 23).

Estimating the gradient
With xm and fm defined in Sect.2.3.1, finite difference quasi-Newton methods approximate the gradient with components where e i is the ith coordinate vector.The most popular choice for α is the constant choice where ε m is the machine precision; another choice for α is made now.After generating each coordinate search direction, we approximate each component of the gradient in a way that is a little different from the forward finite difference approach.The step size generated by extrapolationStep is used instead of the general choice (65).The reason of this change is that we do not need to approximate the gradient by another algorithm due to the additional cost.Let describe how to compute the gradient.If extrapolationStep cannot find a sufficient gain in the tth iteration (n sg = 0), fr is computed and a(t) is unchanged.Given the old best point fm old , the tth component of the gradient is computed by otherwise, it is computed by kinds of directions satisfying 1 ≤ T ≤ + S + R + 2; C, R, and S are the tuning parameters.
Denote by FDS the improved version of FDS-basic.Both setScales and FDS work by making repeated calls to MLS.MLS polls in several suitably chosen directions (implemented by direction) in a line search fashion a few objective function values each in the hope of reducing the objective by more than a multiple of Δ. Schematically, it works as follows: • At first, at most C iterations with coordinate directions are used.They are used to approximate the gradient.• Then, the L-BFGS direction is used only once since the gradient has been estimated by the finite difference technique using the coordinate directions.• Next, except in the final iteration, at most S iterations with subspace directions are used.These directions are very useful, especially after performing the coordinate directions and L-BFGS, due to our numerical experiments.• After generating T − 1 directions without finding a sufficient gain, a cumulative direction is used as final, the T th direction in the hope of finding a model-based gain.
MLS-basic calls an improved version of extrapolationStep, which is the same as extrapolationStep, except that it updates the cumulative step q and the cumulative gain r by updateCum whenever the second cumulative direction is used.
VRBBO initially calls the algorithm setScales to estimate a good scaling of norms, step lengths, and related control parameters.Then, in each iteration, it uses FDS, which aims to repeatedly reduce the function value by an amount of at least a multiple of Δ to update the best point.If no sufficient gain is found in a call to FDS, Δ is reduced by a factor of Q.Once Δ is below a minimum threshold or nfmax is reached, VRBBO is terminated.
An important question is the ordering of the search directions.In Sect.7, it will be shown that after coordinate directions are used the use of subspace directions (limited memory quasi-Newton and random subspace directions) is very preferable.Changing the ordering of other directions is not very effective on the efficiency of our algorithm.
The statement (i) of Theorem 1 remains valid when R is replaced by T , and the statement (ii) of it remains valid with probability ≥ Let T 0 be the maximal number of multi-line searches in setScales as a tuning parameter.Then, setScales uses (2T + 1)T 0 function evaluations which does not affect on the order of the complexity bounds.Theorems 3-5 are valid with probability ≥ 1 − 2 2+C+S−T = 1 − 2 −R , where 5 kinds of directions are used.Given the tuning parameter alg ∈ {0, 1, 2, 3, 4, 5} (algorithm type), we now discuss the factor of bounds depending on the number of search directions used in MLS-basic.We would have the following cases: • In the first case (alg = 0), T = R < n random directions are used.Then complexity results considered as Table 3 are valid.This variant of VRBBO is denoted by VRBBO-basic1.
• In the case (alg = 1), T = R ≥ n random directions are used.Then complexity results considered as Table 3 are valid but with a factor of n 2 .This variant of VRBBO is denoted by VRBBO-basic2.• In the third case (alg = 2), random, random subspace, and cumulative directions are used whose total number is T = S + R + 1 < n.The complexity results considered as Table 3 are valid.This variant of VRBBO is denoted by VRBBO-C-Q, ignoring the coordinate and limited memory quasi-Newton directions.• In the fourth case (alg = 3), coordinate, random, random subspace, and cumulative directions are used whose total number is The complexity results considered as Table 3 are valid but with a factor of n 2 .This variant of VRBBO is denoted by VRBBO-Q, ignoring the limited memory quasi-Newton directions.• In the fifth case (alg = 4), only subspace directions are ignored.The total number of directions used is T = C + R + 2 > n; hence the complexity results are valid but with a factor n 2 .This variant of VRBBO is denoted by VRBBO-S.• In the sixth case (alg = 5), coordinate, L-BFGS, random, random subspace, and cumulative directions are used successively whose total number is The complexity results considered as Table 3 are valid, but with a factor of n 2 .This variant of VRBBO is the default version.
This defines six versions of VRBBO, the full algorithm and 5 simplified variants.In Sect.7, we compare them and show that each simplification degrades the algorithm.This means that all heuristic components of VRBBO are necessary for the best performance.

Numerical results
In this section we compare our new solver with other state-of-the-art solvers on a large public benchmark.

Default parameters for VRBBO
For our tests we used the following parameter choices: Although the best theoretical complexity is obtained for the best numerical result are obtained for much larger R ∼ n.Δ min = 0 implies that the algorithm stops due to nfmax or secmax.Here secmax is maximal time in seconds.
In recent years, there has been an increasing interest in finding the best tuning parameters configuration for derivative-free solvers with respect to a benchmark problem set; see, e.g., [5,50,51].In Table 4, there are 7 integral, 2 binary, 2 ternary, and 14 continuous tuning parameters, giving a total of 25 parameters for tuning our algorithm.A small amount of tuning was done by hand.Automatic tuning of VRBBO will be considered elsewhere.

Test problems used
We compare 30 competitive solvers from the literature (discussed in Sect.7.3) with all 549 unconstrained problems from the CUTEst [23] collection of test problems for optimization and the test problems of Jamil & Yang [33] for global optimization with 2-5000 variables, in the case of variable dimension problems for all allowed dimensions in this range.For problems in dimension n ≥ 21, only the most robust and fastest solvers were compared.To avoid guessing the solution of toy problems with a simple solution (such as all zeros or all ones), we shifted the arguments by As discussed earlier, nfmax denotes maximal number of function evaluations and secmax denotes maximal time in seconds.nf denotes the number of function evaluations.We limited the budget available for each solver by allowing at most where f init is the function value of the starting point (common to all solvers), f opt is the best point known to us, and f s is the best point found by the solver s.Otherwise, it is called unsolved since either nfmax or secmax has been reached.Note that this amounts to testing for finding the global minimizer to some reasonable accuracy.We did not check which of the test problems were multimodal, so that descent algorithms might end up in a local minimum only.
The best point known to us was obtained through numerous attempts for finding the best local minimizer or global minimizer for all test problems by calling several gradient-based solvers such as LMBFG-DDOGL, LMBFG-EIG-MS and LMBFG-EIG-curve-inf presented by Burdakov et al. [11], ASACG presented by Hagar & Zhang [26] and LMBOPT implemented by Kimiaei et al. [36].The condition g k ∞ ≤ 10 −5 was satisfied for all test problems except those listed in Sect.9.2.
For a more refined statistics, we use our test environment (Kimiaei & Neumaier [47]) for comparing optimization routines on the CUTEst test problem collection of Gould et al. [23].A solver is said efficient when it has the lowest relative cost of function evaluations and said robust when it has the highest number of solved problems compared to the state-of-the-art BBO solvers.Performance profile of Dolan & Moré [19] and data profiles of Moré & Wild [42] for the cost measure nf (number of function evaluations needed to reach the target) are displayed to identify which solvers are competitive (efficient and robust) for small to high dimensions.In fact, the efficiency and robustness of all solvers are identified by the performance/data profiles.We denote by S the list of compared solvers, by P the list of problems, by n p the dimension of the problem p ∈ P, and by c p,s the cost measure of the solver s to solve the problem p.The performance of the solver s is the fraction of problems solved by the solver s that τ is the upper bound of the performance ratio pr p,s := c p,s min(c p,s | s ∈ S) , while the data profile of the solver s is the fraction of problems solved by the solver s with κ groups of n p + 1 function evaluations such that κ is the upper bound of the cost ratio cr p,s := c p,s n p + 1 .

Codes compared
We compare VRBBO with the following solvers for unconstrained black box optimization.For some of the solvers we had to choose options different from the default to make them competitive; if nothing is said, the default option were used.• DSPFD, available at pages.discovery.wisc.edu/%7Ecroyer/codes/dspfd_sources.zip, is a direct search Matlab code for derivative-free optimization by Gratton et al. [24].The default parameters are used.
VRBBO and the other stochastic algorithms use random numbers, hence give slightly different results when run repeatedly.Due to run time constraints, each solver was run only once for each problem.However, we checked in preliminary tests that the summarized results reported were quite similar when another run was done.Some of the other solvers have additional capabilities that were not used in our tests; e.g., allowing for bound constraints or integer constraints, or for noisy function value).Hence our conclusions are silent about the performance of these solvers outside the task of global unconstrained black box optimization with noiseless function values (apart from rounding errors).

Results for small dimensions (n ≤ 20)
We have a self-testing and tuning for our solver in terms of the tuning parameter alg shown in Fig. 2. As can be seen from the data and performance profiles, two competitive versions of our solver are VRBBO and VRBBO-Q.In fact, VRBBO is somewhat more robust than VRBBO-Q while VRBBO-Q is somewhat more efficient than VRBBO.As a result, VRBBO is recommended.
Results on CUTEst.Subfigures of Fig. 3 display three comparisons among all solvers in low to high budgets (nfmax ∈ {100n, 500n, 1000n}) on CUTEst.The names of the solvers are given on the horizontal axis of these subfigures, sorted by the number of solved problems in descending order.As can be seen from these subfigures, UOBYQA is more robust and efficient than the others at low to high budgets.By increasing the budget, the efficiency and robustness of VRBBO are increased, making it the fourth rank robust solver with low function evaluations cost compared to the other line search, direct search, Nelder-Mead solvers, etc., except compared to some model-based solvers.
To make more detailed comparisons among the six most robust or efficient solvers, the data and performance profiles are shown in Fig. 4, whose subfigures confirm that UOBYQA is more robust and efficient than the others at low to high budgets.
In summary, for small scale problems from the CUTEst collection, the model-based solvers are recommended, and VRBBO is recommended for high budgets, since it is slightly more robust than some well-known model-based solvers (NEWUOA, BCDFO, and BOBYQA), although these solvers are slightly more efficient than VRBBO.Results on GlobalTest.Subfigures of Fig. 3 display three comparisons among all solvers in low to high budgets (nfmax ∈ {100n, 500n, 1000n}) on GlobalTest.The names of the solvers are given on the horizontal axis of these subfigures, sorted by the number of solved problems in descending order.As can be seen from these subfigures: • At low budget, BOBYQA and VRBBO are the first and second rank robust solvers, while BOBYQA and NEWUOA are the first and second rank efficient solvers.• For medium and large budgets, MCS and GLOUS are the first and second rank robust solvers.In fact, by increasing the budget, the global solvers actually behave better.In this case, VRBBO and VRBBO-Q are comparable to the global solvers.
To make more accurate comparisons among the four most robust or efficient solvers, the data and performance profiles are shown in Fig. 4, whose subfigures confirm that MCS and GLOUS are more robust and efficient than the others at medium and large budgets, respectively.In summary, the global solvers are more robust than the local solvers in the GlobalTest collection, while our findings show that VRBBO is comparable to the global solvers in terms of the efficiency and robustness and is recommended for finding the global minimum.

Results for medium dimensions (21 ≤ n ≤ 100)
For medium to very large scale problems, we ignored the model-based solvers in our comparison because they needed 1  2 n(n + 3) sample points to construct fully quadratic models.There were some too slow solvers that could not solve most problems even with an expansion of secmax.Therefore, we only compared either fast, efficient, or robust solvers and plotted the data and performance profiles for most robust solvers on CUTEst and GlobalTest for medium to very large problems.Results on CUTEst.We conclude from the performance profiles and the data profiles shown in Fig. 5 that VRBBO is much more efficient and robust than other solvers.Results on GlobalTest.We conclude from the performance profiles and the data profiles shown in Fig. 5, that: • At low budget, ADSMAX and VRBBO are the first and second rank robust solvers, while VRBBO is much more efficient than ADSMAX.• At medium budget, ADSMAX, SDBOX, and VRBBO are, respectively, more robust than the others.• At large budget, VRBBO is much more efficient and robust than the others.
In summary, although VRBBO is comparable to the global solvers on small scale problems from GlobalTest, it is much more efficient and robust than them on medium scale problems from GlobalTest.

Results for large dimensions (101 ≤ n ≤ 1000)
Results on CUTEst.As discussed in Sect.7.5, VRBBO, FMINUNC, and SDBOX were three rank robust and efficient solvers on CUTEst.Fig. 6 displays the performance and data profiles comparing these solvers and shows that VRBBO is much more robust than the others and is recommended for large scale problems on CUTEst.Results on GlobalTest.As discussed in Sect.7.5, VRBBO, ADSMAX, and SDBOX were three rank robust and efficient solvers on GlobalTest.We conclude from the Fig. 7 Details as in Fig. 2 performance and data profiles shown in Fig. 7, that VRBBO is the first rank efficient and second rank robust solver for large scale problems on GlobalTest.

Results for very large dimensions (1001 ≤ n ≤ 5000)
Results on CUTEst.We conclude from the performance and data profiles shown in Fig. 8 that VRBBO is much more efficient and robust than the others for large scale problems on CUTEst.
Results on GlobalTest.We conclude from the performance and data profiles shown in Fig. 9 that VRBBO is the first rank efficient and second rank robust solver for large scale problems on GlobalTest.An improved version of our algorithm has additional heuristic techniques that do not affect the order of the complexity results and which turn VRBBO into an efficient global solver, although our theory only guarantees local minimizers.This version even found in most cases either a global minimizer or, where this could not be verified, at least a point of similar quality to the best competitive global solvers.
Two competitive versions of our algorithm were VRBBO and VRBBO-Q in low to high budget on CUTEst and GlobalTest due to the use of all various directions and additional heuristic techniques.As a consequence of our extensive numerical results, UOBYQA is our recommendation for small scale problems in low to high budgets on CUTESt and MCS and GLODS on GlobalTest, while VRBBO is our recommendation for medium to very large scale problems in low to high budgets on CUTESt and GlobalTest.The following theorem was recently proved by Pinelis [48].
Theorem 6 There is a universal constant c 0 ≤ 50 such that for any fixed nonzero real vector q of any dimension n and any random vector p of the same dimension n with independent components uniformly distributed in [−1, 1], we have p T p q T q ≤ c 0 n p T q 2 (70) with probability ≥ 1/2.
Pinelis also proved the lower bound 0.73 < c 0 for the best Pinelis value of the constant c 0 .The true optimal value seems to be approximately 16/7.This is suggested by numerical simulation.To estimate c 0 , we executed three times the Matlab commands (shown in Fig. 10) \% run PinConst N=10000;

Data availability statement
The manuscript has no data as supplementary material.

Delcarations
Conflict of interest The author declare that have no conflict of interest.

Fig. 1
Fig. 1 Flowchart for (a) VRBBO-basic, (b) FDS-basic, (c) MLS-basic.Here R is the number of random direction used by MLS-basic

Fig. 2 Fig. 3 Fig. 4
Fig.2The first and third rows are performance profiles ρ(τ ) in dependence of a bound τ on the performance ratio, see (68), while the second and fourth rows are data profiles δ(κ) in dependence of a bound κ on the cost ratio, see (69).Problems solved by no solver are ignored

Table 2
Complexity results for deterministic BBO . A newest best point xm and its function value fm, a new approximation λ ≥ 0 for the Lipschitz constant L, the number n sg of sufficient gains, and an updated list a of extrapolation step sizes.
a problem with n variables.We ran all solvers by monitoring in the function evaluation a routine the number of function values and the time used until the bound of this number was met or an error occurred.We saved time and number of function values at each improved function value and evaluated afterwards when the target accuracy was reached.In order to get the above choices for nfmax and secmax, we made preliminarily runs to ensure that the best solvers can solve most of the test problems.Both nfmax and secmax are input parameters for all solvers.A problem with dimension n is considered it solved by the solver s if the target accuracy satisfies /www.iasi.cnr.it/~liuzzi/DFL/index.php/list3 is a global optimization algorithm presented by Brachetti et al. [10].• SDBOX, obtained from http://www.iasi.cnr.it/~liuzzi/DFL/index.php/list3 is a derivative-free algorithm for bound constrained optimization problems by Lucidi & Sciandrone [41].

9.2 A list of test problems with f opt
Here we list the CUTEst test problems for which our best point did not satisfy the conditiong k ∞ ≤ 10 −5 .Funding Open acces funding provided by University of Vienna.The frist author acknowledges the financial support of the Doctoral Program Vienna Graduate School on Computational Optimization(VGSCO) funded by the Austrian Scxience Foundation under Project No. W1260-N35.