Abstract
This paper describes a simple, but effective sampling method for optimizing and learning a discrete approximation (or surrogate) of a multi-dimensional function along a one-dimensional line segment of interest. The method does not rely on derivative information and the function to be learned can be a computationally-expensive “black box” function that must be queried via simulation or other means. It is assumed that the underlying function is noise-free and smooth, although the algorithm can still be effective when the underlying function is piecewise smooth. The method constructs a smooth surrogate on a set of equally-spaced grid points by evaluating the true function at a sparse set of judiciously chosen grid points. At each iteration, the surrogate’s non-tabu local minima and maxima are identified as candidates for sampling. Tabu search constructs are also used to promote diversification. If no non-tabu extrema are identified, a simple exploration step is taken by sampling the midpoint of the largest unexplored interval. The algorithm continues until a user-defined function evaluation limit is reached. Numerous examples are shown to illustrate the algorithm’s efficacy and superiority relative to state-of-the-art methods, including Bayesian optimization and NOMAD, on primarily nonconvex test functions.
Similar content being viewed by others
References
Bazaraa Mokhtar S, Sherali Hanif D, Shetty Chitharanjan M (2006) Nonlinear programming: theory and algorithms, 3rd edn. Wiley-Interscience, New Jersey
Bergou EH, Diouane Y, Gratton S (2018) A line-search algorithm inspired by the adaptive cubic regularization framework and complexity analysis. J Optim Theory Appl 178(3):885–913
Bergou EH, Diouane Y, Kunc V, Kungurtsev V, Royer CW (2022) A subsampling line-search method with second-order results. INFORMS J Optim 4(4):403–425
Dimitri B (1999) Nonlinear Programming, 2nd edn. Athena Scientific, USA
Bhosekar A, Ierapetritou M (2018) Advances in surrogate based modeling, feasibility analysis, and optimization: a review. Comput Chem Eng 108:250–267
Brochu E, Cora VM, De Freitas N (2010) A tutorial on bayesian optimization of expensive cost functions, with application to active user modeling and hierarchical reinforcement learning. arXiv preprintarXiv:1012.2599
Chae Y, Wilke DN (2019) Empirical study towards understanding line search approximations for training neural networks. arXiv preprintarXiv:1909.06893
Conn AR, Scheinberg K, Vicente LN (2009) Introduction to derivative-free optimization. SIAM
Constable SC, Parker RL, Constable CG (1987) Occam’s inversion: a practical algorithm for generating smooth models from electromagnetic sounding data. Geophysics 52(3):289–300
Costa A, Nannicini G (2018) Rbfopt: an open-source library for black-box optimization with costly function evaluations. Math Program Comput 10(4):597–629
Cozad A, Sahinidis NV, Miller DC (2014) Learning surrogate models for simulation-based optimization. AIChE J 60(6):2211–2227
del Rio Chanona EA, Petsagkourakis P, Bradford E, Graciano JEA, Chachuat B (2021) Real-time optimization meets bayesian optimization and derivative-free optimization: a tale of modifier adaptation. Comput Chem Eng 147:107249
Fred G, Manuel L (1998) Tabu search. Handbook of combinatorial optimization. Springer, Berlin, pp 2093–2229
Gutmann H-M (2001) A radial basis function method for global optimization. J Global Optim 19(3):201–227
Trevor H, Robert T, Jerome F (2009) The elements of statistical learning: data mining, inference, and prediction. Springer, Berlin
Huyer W, Neumaier A (1999) Global optimization by multilevel coordinate search. J Global Optim 14:331–355
Huyer W, Neumaier A (2008) Snobfit-stable noisy optimization by branch and fit. ACM Trans Math Softw (TOMS) 35(2):1–25
Hyndman Rob J, Koehler Anne B (2006) Another look at measures of forecast accuracy. Int J Forecast 22(4):679–688
Lagarias JC, Reeds JA, Wright MH, Wright PE (1998) Convergence properties of the nelder-mead simplex method in low dimensions. SIAM J Optim 9(1):112–147
Lagarias JC, Poonen B, Wright MH (2012) Convergence of the restricted nelder-mead algorithm in two dimensions. SIAM J Optim 22(2):501–532
Larson J, Menickelly M, Wild SM (2019) Derivative-free optimization methods. Acta Numer 28:287–404
Le Digabel S (2011) Algorithm 909: NOMAD: nonlinear optimization with the MADS algorithm. ACM Trans Math Softw 37(4):1–15
Mahsereci M, Hennig P (2015) Probabilistic line searches for stochastic optimization. Adv Neural Inf Process Syst 28
Müller J (2016) MISO: mixed-integer surrogate optimization framework. Optim Eng 17(1):177–203
Nelder John A, Roger M (1965) A simplex method for function minimization. Comput J 7(4):308–313
Neumaier A, Azmi B (2019) Line search and convergence in bound-constrained optimization. Technical report, Technical report, University of Vienna
Jorge N, Stephen W (2006) Numerical optimization. Springer, Berlin
Ong YS, Nair PB, Keane AJ, Wong KW (2005) Surrogate-assisted evolutionary optimization frameworks for high-fidelity engineering design problems. Know Incorp Evolut Comput, pp 307–331
Paquette C, Scheinberg K (2020) A stochastic line search method with expected complexity analysis. SIAM J Optim 30(1):349–376
Nikolaos P, Sahinidis Nikolaos V (2021) Review and comparison of algorithms and software for mixed-integer derivative-free optimization. J Global Optim 82:1–30
Powell MJD et al (2009) The bobyqa algorithm for bound constrained optimization without derivatives. Cambridge NA Report NA2009/06, University of Cambridge, Cambridge, 26
Rios LM, Sahinidis NV (2013) Derivative-free optimization: a review of algorithms and comparison of software implementations. J Global Optim 56(3):1247–1293
Ruder S (2016) An overview of gradient descent optimization algorithms. arXiv preprintarXiv:1609.04747
Sen Mrinal K, Stoffa Paul L (2013) Global optimization methods in geophysical inversion. Cambridge University Press, Cambridge
Shahriari B, Swersky K, Wang Z, Adams RP, De Freitas N (2015) Taking the human out of the loop: A review of bayesian optimization. Proc IEEE 104(1):148–175
Snyman JA, Wilke DN (2018) Line search descent methods for unconstrained minimization. In: Practical Mathematical Optimization, pp 41–69. Springer
Surjanovic S, Bingham D (2013) Virtual library of simulation experiments: test functions and datasets. Simon Fraser University, Burnaby, BC, Canada. URL www.sfu.ca/ssurjano/optimization.html
Acknowledgements
We thank a long list of colleagues for their engaging discussions including, but not limited to: Dr. David Schmidt for providing an initial prototype MATLAB implementation upon which our implementation was further enhanced; Dr. Alison Cozad and Dr. Benjamin Sauk for informative discussions concerning the ALAMO software and methodology; Dr. Nikos Ploskas for guidance on DFO conventions and references; Dr. Ashutosh Tewari for informative discussions on Gaussian Process Regression; Dr. Brent Wheelock for stimulating case study discussions. Finally, we wish to thank two anonymous referees whose suggestions helped us improve the quality of this paper.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
In this section, we offer detailed supplementary material to give a more holistic picture of our approach. Section 6.1 furnishes detailed information about our benchmark suite of functions. Section 6.2 shows the CPU time per iteration for LineWalker-full as a function of the number N of grid indices. Section 6.3 explains our experiments with ALAMO. Section 6.4 showcases a detailed visual comparison between bayesopt and LineWalker-full.
1.1 Benchmark suite of functions
Table 2 categorizes these test functions based on their shape and number of extrema. Tables 3 and 4 provide the precise analytical form of each test function, along with the set of global minimizers and minimum objective function values.
1.2 CPU time per iteration
While we assume that computation time will be dominated by calls to an expensive simulator or oracle, for sake of completeness, we provide some empirical evidence of LineWalker-full’s computation time. Specifically, using the Shekel function, we varied the grid size N from 1k to 14k points. For each value of N, we took 50 samples, of which the first 11 are taken before the main loop. Figure 8 shows that, for \(N=\)5k, it takes LineWalker-full roughly 0.5 s, and nearly 3 s for \(N=\)10k. The equation in the figure indicates that the CPU time increases at a cubic rate in N, which is not surprising since LineWalker-full must solve a linear system whose computational complexity \(O(N^3)\).
1.3 ALAMO
In our experiments with ALAMO version 2021.12.28, we permitted ALAMO to include the following functions in its surrogate model construction: constant; linear; logarithmic; exponential; sine; cosine; monomials with powers 0.5, 2, 3, 4, and 5; pairwise combinations of basis functions (see line “multi2power 2 3 4”); and Gaussian radial basis functions. Out of fairness, we did not include any custom basis functions as we attempted to mimic the likely assumptions an agnostic user with a truly unknown black box function would do with ALAMO. We provided ALAMO with 11 initial samples (the same points given to all methods) from which to begin the surrogate construction. We minimized mean square error (MSE) so that ALAMO could perform well in the RMSE metric even though this might lead to surrogates with a larger number of basis functions. Figure 9 shows an example ALAMO input file for grlee12Step.
Critical to ALAMO’s exploration is the choice of adaptive sampling technique. We selected the popular DFO method SNOBFIT rather than a random sampler to avoid stochasticity and having to average results. While this pairing of ALAMO and SNOBFIT worked reasonably well for eight out of the first twelve functions, four functions - ackley, dejong5, langer, and schwefel - caused issues that we do not understand and could not resolve. Specifically, we could not force SNOBFIT to make additional samples for these four functions. With no additional samples, the surrogate quality stagnated as no improvements were made based on additional information. Since we could not resolve this issue (even after trying different objective function criteria, e.g., BIC and AICc), we chose not to compare against ALAMO for these functions. Finally, for reasons that we do not understand, we could not force SNOBFIT to evaluate only one new sample point in each iteration even after setting the ALAMO parameter maxpoints to 1. As a consequence, we permitted ALAMO to have more samples than the other methods to construct its surrogates. This explains why in Fig. 6, ALAMO performed 22, 34, and 43 function evaluations when the other methods were given a strict limit of 20, 30, and 40 function evaluations, respectively.
1.4 LineWalker vs. bayesopt: visual comparison of each benchmark function approximation
Figures 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28 and 29 provide a tantalizing visual comparison between LineWalker-full and bayesopt for all 20 functions given a budget \(E^{\max ,\text {total}}\) of 20, 30, 40, and 50 function evaluations. No legend is given for the LineWalker-full figures; see the Fig. 1 legend for details. The bayesopt legend requires some explanation. Because the objective function is noise-free (i.e., deterministic), the “Model mean” and “Noise error bars” coincide and represent the mean of the GPR posterior distribution. Meanwhile, the “Model error bars” show the 95% confidence bounds for the posterior mean.
In addition to our motivating example shown in Fig. 6LineWalker-full produces a much better surrogate than bayesopt on several functions:
-
Eason-Schaffer2A (Fig. 13): Along the long plateau in the interval [0, 24], bayesopt produces a surrogate that looks far more like a “sagging electric cable transmission line” than a straight line fit.
-
Grimacy & Lee (Fig. 15): bayesopt never resolves the local maximum at \(x \approx 0.7\) such that, even with 50 function evaluations, the mean predicted objective value at this point according to the posterior distribution is 7, not 6. LineWalker-full resolves this local maximum.
-
Holder (Fig. 16): LineWalker-full is far more successful at finding many of the function’s peaks than bayesopt.
-
Langer (Fig. 17): LineWalker-full produces a more accurate approximation than bayesopt at the local extrema near \(x=0.5\) (grid index 300) and \(x=3.75\) (grid index 1750), and in the interval [6.5, 7.5].
-
Levy13 (Fig. 20): When the budget \(E^{\max ,\text {total}} \le 40\), bayesopt underestimates the local maximum around \(x=-2.25\) (grid index 700) and, with \(E^{\max ,\text {total}} \le 20\), struggles in the interval \([-1,2]\).
-
SawtoothD (Fig. 24): bayesopt produces a surrogate that looks more like a “sagging electric cable transmission line” relative to the true function and what LineWalker-full generates.
-
Shekel (Fig. 27): Although bayesopt finds a near global minimum in 20 samples while LineWalker-full, its surrogate is inferior to that of LineWalker-full when \(E^{\max ,\text {total}} \in \{30,40\}\).
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Papageorgiou, D.J., Kronqvist, J. & Kumaran, K. Linewalker: line search for black box derivative-free optimization and surrogate model construction. Optim Eng (2024). https://doi.org/10.1007/s11081-023-09879-9
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11081-023-09879-9