This section presents a systematic comparison of thirteen available implementations of derivative-free optimization algorithms on bound-constrained mixed-integer problems. The testbed includes: (i) pure-integer and mixed-integer problems, and (ii) small, medium, and large problems covering a wide range of characteristics found in applications. We evaluate the solvers according to their ability to find a near-optimal solution, find the best solution among currently available solvers, and improve a given starting point.
Experimental setup
Since most derivative-free optimization solvers are designed for low-dimensional unconstrained problems, we consider problems with a maximum of 500 variables with bounds only. The thirteen derivative-free optimization solvers presented in Sect. 3 were tested on 267 problems from the MINLPLib 2 library. Most of the original problems from the MINLPLib 2 library have constraints. In this paper, we are interested in solving bound-constrained mixed-integer problems, so we omitted the constraints. We also eliminated the variables that were redundant after eliminating the constraints. Moreover, we also used 79 continuous problems from the MINLPLib 2 library and imposed integrality constraints on the variables in order to have a representative sample of non-binary discrete problems. Table S1 in the Online Supplement provides a complete listing of the test problems and model statistics. We used the general-purpose global optimization solver BARON [70] to obtain the global solution of each problem.
The computational experiments were performed on an Intel Xeon CPU W-2123 with 32 GB of main memory and a clock of 3.6 GHz, running under Centos 7 64-bit. All solvers were tested using a limit of 2500 function evaluations for each run. All solvers require variable bounds except for NOMAD. For problems with missing bounds in the problem formulation, we restricted all variables to the interval \(\left[ -10,000, 10,000 \right] \). Whenever starting points were required, they were drawn from a uniform distribution from the box-bounded region. We generated five random starting points for each problem. Solvers that use the provided starting point (BFO, DAKOTA/MADS, DFLBOX, DFLGEN, MIDACO, NOMAD, SNOBFIT, TOMLAB/MSNLP) ran once from each of the five different starting points. The same randomly generated starting points were used for all solvers. MISO supplements the provided starting point with a set of points sampled via its default sampling strategy, i.e., symmetric Latin hypercube sampling. DAKOTA/SOGA, does not use the provided starting points but uses randomly chosen starting points, thus it was also ran five times. All other solvers that do not use the provided starting point and are deterministic solvers (TOMLAB/GLCDIRECT, TOMLAB/GLCFAST, TOMLAB/GLCSOLVER) were ran once.
In order to assess the quality of the solutions obtained by different solvers, we compared the solution obtained by the derivative-free optimization solvers against the globally optimal solution for each problem. A solver was considered to have successfully solved a problem if it returned a solution with an objective function value within 1% or 0.01 of the global optimum, whichever was larger. Since we performed five runs for each solver that utilizes the provided starting point, starting each time from a different starting point, we compared the average- and best-case behavior of each solver. Finally, we used the default algorithmic parameters for each solver, i.e., we did not tune solvers in any way to the problems at hand. Table 3 lists the specific versions of solvers used in this computational study.
Table 3 Derivative-free optimization solvers used in this computational study Computational results
Tables S2–S14 in the Online Supplement provide for each solver the median over the five optimization runs. Tables S15–S26 present the best-case performance out of all five runs. For each solver, we report the execution time, the number of iterations (function evaluations), the solution, and the optimality gap (% difference between solution returned by the solver and the global solution). A dash (“–”) is used when the optimality gap is larger than or equal to 100%. In order to compare the quality of solutions returned, we compared the average- and best-case behavior of each solver. For the average-case behavior, we compared solvers using the median objective function value over the five different runs. For the best-case comparison, we compared the best solution found by each solver after all five runs. Best-case behavior is presented in the figures and analyzed below unless explicitly stated otherwise. The figures in this subsection are performance profiles [50] and present the fraction of problems solved by each solver within an optimality tolerance of 1%. The figures in Section B of the Online Supplement present the fraction of problems for which each solver achieved a solution as good as the best solution among all solvers, without regard to the global solution of the problems. When multiple solvers achieved the same solution, they were all credited as having the best solution among the solvers.
Figure 3 presents the fraction of problems solved by each solver. The horizontal axis shows the progress of a solver as the number of function evaluations gradually reaches 2500. If we only consider the solutions obtained by the solvers at the 2500 function evaluation limit, the best solvers, NOMAD and MISO, solved 77% and 76% of the problems, respectively. SNOBFIT solved 69% of the problems, while DAKOTA/MADS solved 56% of the problems. Most of the remaining solvers were able to solve more than 32% of the problems. TOMLAB/MSNLP had the worst performance, solving only 12% of the problems. Only four solvers could find an optimal solution for more than half of the problems. It is evident that most solvers can solve only a small number of problems. Next, we investigate the performance of solvers on different subsets of problems, dividing the problems based on the number and type of variables involved (mixed-integer or pure-integer).
The test set includes 176 pure-integer problems and 91 mixed-integer problems. Figure 4 presents the fraction of pure-integer problems solved by each solver within the optimality tolerance. MISO found the optimal solution on 85% of the problems, while NOMAD found the optimal solution 73% of the problems. SNOBFIT and DAKOTA/MADS found the optimal solution on 66% and 59% of the problems, respectively. DAKOTA/SOGA, MIDACO, and TOMLAB solvers found the optimal solution on \(40{-}42\)% of the problems. BFO found the optimal solution on 33% of the problems, while DFLBOX, DFLGEN, and TOMLAB/MSNLP had the worst performance, solving only less than 24% of the pure-integer problems. MISO is clearly the best solver for solving these pure-integer problems. DFLBOX, DFLGEN, and TOMLAB/MSNLP are not good options for solving this collection of pure-integer problems.
We also study the performance of algorithms on binary and non-binary discrete problems. The test set includes 78 binary and 74 non-binary discrete problems. Figures 5 and 6 present the fraction of binary and non-binary discrete problems, respectively, solved by each solver within the optimality tolerance. MISO outperforms all other solvers on binary problems. More specifically, MISO can solve 81% of the binary problems, almost twice as many as the second best performers, DAKOTA/SOGA and NOMAD, can solve. All other solvers can solve less than 39% of binary problems. DAKOTA/MADS, MISO, NOMAD, and SNOBFIT are the best performers on non-binary discrete problems, solving \(80{-}92\)% of the problems. MIDACO can solve 51% of these problems, while all other solvers can solve less than 38% of the problems.
Figure 7 presents the fraction of mixed-integer problems solved by each solver. NOMAD leads over the entire range of function evaluations, finding an optimal solution on 84% of the problems. DFLBOX, DFLGEN, MISO, and SNOBFIT are also performing well in this category, solving \(60{-}74\)% of the problems. Contrary to their poor performance on pure-integer problems, DFLBOX and DFLGEN are able to solve a significant number of mixed-integer problems. On the other hand, DAKOTA/SOGA performed much better on pure-integer problems (solved 31% of the problems) than on mixed-integer problems (solved 13% of the problems). DAKOTA/SOGA had the worst performance, solving only 16% of the problems. NOMAD is the best solver for solving this collection of mixed-integer problems, followed by DFLBOX, DFLGEN, MISO, and SNOBFIT.
The computational results show that only two solvers can solve more than half of the problems. One factor that may significantly impact solver performance is the problem size. To investigate the effect of size on problem performance, we divided the problem set into three categories: (i) small problems with one to ten variables, (ii) medium problems with 11 to 50 variables, and (iii) large problems with 51 to 500 variables. The problem set includes 52 small problems, 102 medium problems, and 113 large problems.
Figure 8 presents the fraction of small problems solved by each solver within the optimality tolerance. Eight solvers were able to solve more than 84% of the problems with one to 10 variables. More specifically, DAKOTA/MADS and NOMAD found an optimal solution on all of the problems. Additionally, DAKOTA/MADS solved all small problems in less than 178 function evaluations on average. SNOBFIT found an optimal solution on 97% of the problems, while MIDACO and MISO found an optimal solution on 96% of the problems. TOMLAB/GLCDIRECT, TOMLAB/GLCFAST, and TOMLAB/GLCSOLVE found an optimal solution on 84% of the problems. TOMLA/MSNLP had the worst performance, only solving 40% of the problems.
Figure 9 presents the fraction of medium problems solved by each solver within the optimality tolerance. Similar to small problems, NOMAD was the best solver, solving 93% of the problems with 11 to 50 variables. DAKOTA/MADS, MISO, and SNOBFIT are also performing well, solving 82% and 85% of the problems, respectively. On the other hand, TOMLAB/MSNLP had the worst performance on this collection by solving only 11% of the problems.
Figure 10 presents the fraction of large problems solved by each solver. All solvers had lower success rates for these problems in comparison to their performance for smaller problems. MISO was able to solve 59% of the problems, followed by NOMAD and SNOBFIT that solved 51% and 43% of the problems, respectively. The remaining solvers solved fewer than 28% of the problems. TOMLAB/MSNLP did not solve any of these problems.
Improvement from starting point
Moré and Wild [50] proposed a benchmarking procedure for derivative-free optimization solvers that measures each solver’s ability to improve a starting point. For a given \(0 \le \tau \le 1\) and starting point \(x_0\), a solver is considered to have successfully improved the starting point if
$$\begin{aligned} f_{x_0} -f_{solver} \ge (1 - \tau )(f(x_0) - f_L) \end{aligned}$$
where \(f(x_0)\) is the objective value at the starting point, \(f_{solver}\) is the solution reported by the solver, and \(f_L\) is the global solution. We used this measure to evaluate the best-case performance of each solver. In other words, a problem was considered solved by a solver if the best solution from the five runs improved the associated starting point by at least a fraction of \(1 - \tau \) of the largest possible reduction. The starting points were drawn from a uniform distribution from the box-bounded region.
Figure 11 presents the fraction of problems for which the starting point was improved. NOMAD improved the starting points for 85% of the problems for \(\tau = 1\mathrm {e}{-1}\), and its ability to improve the starting points is slightly reduced for smaller values of \(\tau \). MISO improved the starting points for 96% of the problems for \(\tau = 1\mathrm {e}{-1}\), but its ability to improve the starting points dropped considerably for smaller values of \(\tau \). SNOBFIT improved the starting points for 80% of the problems for \(\tau = 1\mathrm {e}{-1}\), and its ability to improve the starting points is slightly reduced for smaller values of \(\tau \). DAKOTA/SOGA and MIDACO are also performing well for \(\tau = 1\mathrm {e}{-1}\) but they are not very efficient for larger values of \(\tau \).
In Section C of the Online Supplement, we present the fraction of problems for which starting points were improved for each type of problem, i.e., (i) pure-integer and mixed-integer problems, and (ii) small, medium and large problems. The results are very similar to those in the figures of this section and demonstrate higher success rates for smaller problems. More specifically, NOMAD leads over most values of \(\tau \) in all categories. DAKOTA/SOGA improved the starting points for larger values of \(\tau \), but its performance dropped considerably for smaller values of \(\tau \).
Minimal sufficient set of solvers
The computational results revealed that MISO and NOMAD are clearly superior to other MIDFO solvers for the problems considered in this study. However, neither solver was able to find an optimal solution for all our test problems. For instance, NOMAD solved only 51% of the large problems, while MISO solved only 60% of the mixed-integer problems. Therefore, it is worthwhile to find a minimal cardinality subset of the solvers capable of collectively solving as many problems in our test set as possible. Toward this end, we considered the best solution derived by each solver in the five runs. For each problem size, we first identify all problems that can be solved. Then, we determine the smallest number of solvers that collectively solve all these problems. A solver is not included in a minimally sufficient set of solvers if the problems solved by this solver form a strict subset of the problems solved by another solver. Figure 12 presents the minimum number of solvers required to solve the problems in our collection broken down by problem type. BFO, DAKOTA/SOGA, DFLBOX, MISO, NOMAD, and SNOBFIT collectively solve 93% of all problems and 92% of the pure-integer problems. Finally, DFLBOX, NOMAD, and SNOBFIT collectively solve 95% of the mixed-integer problems. Interestingly, even though MISO is a very good solver for all problem classes, it is not in the minimal set of solvers for the mixed-integer problems because it is completely dominated by NOMAD for these problems. On the other hand, MISO contributes the most on the solution of the pure-integer problems.
Figure 13 presents the minimum sufficient number of solvers as a function of problem size. NOMAD solved all small problems and 93% of the medium problems, while MISO solved 54% of the large problems. Finally, BFO, DFLBOX, MISO, NOMAD, and SNOBFIT collectively solved 85% of the large problems.
Variance of the results
The previous figures were presented in terms of the best results among five problem instances for each solver with a limit of 2500 function evaluations. In this subsection, we discuss the variance of the results, as many solvers have varying performance as a function of the starting point given as input and random seeds used in the computations. Although DAKOTA/SOGA does not utilize the provided starting points, it was executed five times since it is a stochastic solver. TOMLAB/GLCDIRECT, TOMLAB/GLCFAST, and TOMLAB/GLCSOLVER were only run once since they do not utilize the provided starting points and they are deterministic solvers.
The difference in scales of the global solutions and the range of values of the objective function of the test problems make a direct comparison difficult. Therefore, we scale the objective function values as follows:
$$\begin{aligned} f_{scaled} = 1 - \frac{|f_L - f_{solver}|}{(1\mathrm {e}{-10} + |f_{L}|)} \end{aligned}$$
where \(f_{solver}\) is a solution obtained by the solver and \(f_L\) is the global solution. If \(f_{scaled} < 0\) (i.e., the optimality gap is larger than 100%), we set the scaled objective function value equal to 0. Hence, the resulting scaled objective function value is in the interval \(\left[ 0, 1 \right] \). A value of 1 corresponds to the global solution, while a value of 0 corresponds to a solution with an optimality gap of larger than or equal to 100%.
Figure 14 presents the average scaled best, mean, median, and worst results among the five optimization instances for all test problems. BFO, DAKOTA/MADS, DFLBOX, DFLGEN, MIDACO, MISO, NOMAD, SNOBFIT, and TOMLAB/MSNLP use the starting point provided, while the remaining solvers do not use it. As expected, most local solvers and global stochastic and hybrid solvers produce varying results in the different runs because of the starting point given as input and the random seeds used in the computations. However, solution variability is small for all solvers, indicating that starting points and random seeds do not significantly affect them. The solver with the largest variability is DAKOTA/MADS.
Computational effort
In this section, two metrics are considered in comparing the computational effort of different solvers: (i) the number of functions evaluations required by each solver, and (ii) each solver’s execution time (CPU time). By combining former analysis on solution quality and computational effort, further analysis is then proposed to show solver efficiency in solving the problems of the testbed. Depending on whether or not an evaluation of the objective function is time-consuming, different metrics are more important for obtaining a solution faster. Since the test problems are algebraically and computationally simple and small, the total time required for function evaluations for all runs was negligible. Most of the solvers’ execution time was spent on processing function values and determining the sequence of iterates. In cases where the evaluation of the objective function is not time-consuming, the execution time of the solvers is more important. However, in applications where an evaluation of the objective function requires a significant amount of time, the number of function evaluations that the solvers perform is the factor that determines computational efficiency. Global optimization methods will perform more iterations and will likely require more execution time than local methods.
Tables 4 and 5 present the computational efficiency of the solvers in terms of function evaluations and execution time. Table 4 shows that function evaluations of various solvers differ greatly. MIDACO, MISO, SNOBFIT, and all TOMLAB solvers require more than 2000 function evaluations. In some cases, this is due to the solver performing a large number of samples at early function evaluations. In other cases, solvers employ restart strategies to minimize the likelihood of getting trapped in local solutions. As mentioned before, these methods are global optimization methods and thus require more function evaluations. BFO, DAKOTA/MADS, DAKOTA/SOGA, and NOMAD require 1400 to 2000 function evaluations, while DFLBOX, and DFLGEN require fewer than 1138 function evaluations. On the other hand, most solvers can find the best solution on relative few function evaluations. DFLBOX, DFLGEN, and NOMAD find their best solution in less than 650 function evaluations on average. In terms of execution time, all solvers, except for the DAKOTA solvers, MISO, NOMAD, and SNOBFIT, need less than 43 seconds on average to solve the instances, regardless of problem size. The DAKOTA solvers, MISO, NOMAD, and SNOBFIT require considerably more time to solve the problems. DAKOTA, MISO, and SNOBFIT demand CPU times that increase with problem size. Interestingly, NOMAD requires fewer iterations for large than medium problems and solves large problems faster. Even though MIDACO and TOMLAB solvers implement global optimization methods, they do not require much time compared to other global methods like MISO, NOMAD, and SNOBFIT.
Table 4 Computational effort of solvers in terms of function evaluations (tot: average total number of function evaluations, opt: average function evaluations by which best solution was found) Table 5 Computational effort of solvers in terms of average execution time (s) Figure 15 presents the efficiency of each solver in terms of the number of function evaluations performed. The average number of function evaluations, represented by the horizontal axis, presents the computational effort of each solver, while the vertical axis indicates the quality of solutions in terms of the percentage of problems solved. Solvers that are located on the upper left corner of the figure indicate good efficiency. Approaches found on the lower right area indicate poor efficiency. BFO, DAKOTA/MADS, DAKOTA/SOGA, DFLGEN, MIDACO, MISO, NOMAD, SNOBFIT, TOMLAB/GLCDIRECT, TOMLAB/GLCFAST, and TOMLAB/GLCSOLVE required more than 1135 function evaluations and solved more than 31% of the problems. The least efficient solver is TOMLAB/MSNLP, which required 2474 function evaluations on average and solved 12% of the problems. DFLBOX required only 750 function evaluations on average and solved 38% of the problems. Figure 15 can also be interpreted as a Pareto front, in which case DAKOTA/MADS, DFLBOX, and NOMAD dominate all others.
Figure 16 presents the efficiency of each solver in terms of execution time. Even though MISO solved 76% of the problems, it required 4385 seconds on average. Similarly, NOMAD solved 77% of the problems, but it required 2229 seconds on average. DAKOTA/MADS solved 56% of the problems requiring 1266 seconds on average, while SNOBFIT solved 69% of the problems requiring 208 seconds. The best solvers in terms of time efficiency were MIDACO, TOMLAB/GLCDIRECT, TOMLAB/GLCFAST, and TOMLAB/GLCSOLVE, which solved \(37{-}43\)% of the problems, requiring less than 23 seconds. BFO, DFLBOX, and DFLGEN were less efficient than the aforementioned solvers as they solved more than 31% of the problems requiring less than 43 seconds. The remaining solvers were not very efficient because of either large execution times (DAKOTA/SOGA) or the small fraction of problems that they solved (TOMLAB/MSNLP).