Linewalker: line search for black box derivative-free optimization and surrogate model construction

Papageorgiou, Dimitri J.; Kronqvist, Jan; Kumaran, Krishnan

doi:10.1007/s11081-023-09879-9

Linewalker: line search for black box derivative-free optimization and surrogate model construction

Research Article
Published: 21 February 2024

(2024)
Cite this article

Optimization and Engineering Aims and scope Submit manuscript

Dimitri J. Papageorgiou¹,
Jan Kronqvist² &
Krishnan Kumaran¹

125 Accesses
Explore all metrics

Abstract

This paper describes a simple, but effective sampling method for optimizing and learning a discrete approximation (or surrogate) of a multi-dimensional function along a one-dimensional line segment of interest. The method does not rely on derivative information and the function to be learned can be a computationally-expensive “black box” function that must be queried via simulation or other means. It is assumed that the underlying function is noise-free and smooth, although the algorithm can still be effective when the underlying function is piecewise smooth. The method constructs a smooth surrogate on a set of equally-spaced grid points by evaluating the true function at a sparse set of judiciously chosen grid points. At each iteration, the surrogate’s non-tabu local minima and maxima are identified as candidates for sampling. Tabu search constructs are also used to promote diversification. If no non-tabu extrema are identified, a simple exploration step is taken by sampling the midpoint of the largest unexplored interval. The algorithm continues until a user-defined function evaluation limit is reached. Numerous examples are shown to illustrate the algorithm’s efficacy and superiority relative to state-of-the-art methods, including Bayesian optimization and NOMAD, on primarily nonconvex test functions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Algorithm 3

Algorithm 5

Data-driven distributionally robust optimization using the Wasserstein metric: performance guarantees and tractable reformulations

Article Open access 07 July 2017

Overfitting, Model Tuning, and Evaluation of Prediction Performance

The Frank-Wolfe Algorithm: A Short Introduction

Article Open access 13 December 2023

References

Bazaraa Mokhtar S, Sherali Hanif D, Shetty Chitharanjan M (2006) Nonlinear programming: theory and algorithms, 3rd edn. Wiley-Interscience, New Jersey
Book Google Scholar
Bergou EH, Diouane Y, Gratton S (2018) A line-search algorithm inspired by the adaptive cubic regularization framework and complexity analysis. J Optim Theory Appl 178(3):885–913
Article MathSciNet Google Scholar
Bergou EH, Diouane Y, Kunc V, Kungurtsev V, Royer CW (2022) A subsampling line-search method with second-order results. INFORMS J Optim 4(4):403–425
Article MathSciNet Google Scholar
Dimitri B (1999) Nonlinear Programming, 2nd edn. Athena Scientific, USA
Google Scholar
Bhosekar A, Ierapetritou M (2018) Advances in surrogate based modeling, feasibility analysis, and optimization: a review. Comput Chem Eng 108:250–267
Article CAS Google Scholar
Brochu E, Cora VM, De Freitas N (2010) A tutorial on bayesian optimization of expensive cost functions, with application to active user modeling and hierarchical reinforcement learning. arXiv preprintarXiv:1012.2599
Chae Y, Wilke DN (2019) Empirical study towards understanding line search approximations for training neural networks. arXiv preprintarXiv:1909.06893
Conn AR, Scheinberg K, Vicente LN (2009) Introduction to derivative-free optimization. SIAM
Constable SC, Parker RL, Constable CG (1987) Occam’s inversion: a practical algorithm for generating smooth models from electromagnetic sounding data. Geophysics 52(3):289–300
Article ADS Google Scholar
Costa A, Nannicini G (2018) Rbfopt: an open-source library for black-box optimization with costly function evaluations. Math Program Comput 10(4):597–629
Article MathSciNet Google Scholar
Cozad A, Sahinidis NV, Miller DC (2014) Learning surrogate models for simulation-based optimization. AIChE J 60(6):2211–2227
Article ADS CAS Google Scholar
del Rio Chanona EA, Petsagkourakis P, Bradford E, Graciano JEA, Chachuat B (2021) Real-time optimization meets bayesian optimization and derivative-free optimization: a tale of modifier adaptation. Comput Chem Eng 147:107249
Article Google Scholar
Fred G, Manuel L (1998) Tabu search. Handbook of combinatorial optimization. Springer, Berlin, pp 2093–2229
Google Scholar
Gutmann H-M (2001) A radial basis function method for global optimization. J Global Optim 19(3):201–227
Article MathSciNet Google Scholar
Trevor H, Robert T, Jerome F (2009) The elements of statistical learning: data mining, inference, and prediction. Springer, Berlin
Google Scholar
Huyer W, Neumaier A (1999) Global optimization by multilevel coordinate search. J Global Optim 14:331–355
Article MathSciNet Google Scholar
Huyer W, Neumaier A (2008) Snobfit-stable noisy optimization by branch and fit. ACM Trans Math Softw (TOMS) 35(2):1–25
Article MathSciNet Google Scholar
Hyndman Rob J, Koehler Anne B (2006) Another look at measures of forecast accuracy. Int J Forecast 22(4):679–688
Article Google Scholar
Lagarias JC, Reeds JA, Wright MH, Wright PE (1998) Convergence properties of the nelder-mead simplex method in low dimensions. SIAM J Optim 9(1):112–147
Article MathSciNet Google Scholar
Lagarias JC, Poonen B, Wright MH (2012) Convergence of the restricted nelder-mead algorithm in two dimensions. SIAM J Optim 22(2):501–532
Article MathSciNet Google Scholar
Larson J, Menickelly M, Wild SM (2019) Derivative-free optimization methods. Acta Numer 28:287–404
Article MathSciNet Google Scholar
Le Digabel S (2011) Algorithm 909: NOMAD: nonlinear optimization with the MADS algorithm. ACM Trans Math Softw 37(4):1–15
Article MathSciNet Google Scholar
Mahsereci M, Hennig P (2015) Probabilistic line searches for stochastic optimization. Adv Neural Inf Process Syst 28
Müller J (2016) MISO: mixed-integer surrogate optimization framework. Optim Eng 17(1):177–203
Article MathSciNet Google Scholar
Nelder John A, Roger M (1965) A simplex method for function minimization. Comput J 7(4):308–313
Article MathSciNet Google Scholar
Neumaier A, Azmi B (2019) Line search and convergence in bound-constrained optimization. Technical report, Technical report, University of Vienna
Jorge N, Stephen W (2006) Numerical optimization. Springer, Berlin
Google Scholar
Ong YS, Nair PB, Keane AJ, Wong KW (2005) Surrogate-assisted evolutionary optimization frameworks for high-fidelity engineering design problems. Know Incorp Evolut Comput, pp 307–331
Paquette C, Scheinberg K (2020) A stochastic line search method with expected complexity analysis. SIAM J Optim 30(1):349–376
Article MathSciNet Google Scholar
Nikolaos P, Sahinidis Nikolaos V (2021) Review and comparison of algorithms and software for mixed-integer derivative-free optimization. J Global Optim 82:1–30
MathSciNet Google Scholar
Powell MJD et al (2009) The bobyqa algorithm for bound constrained optimization without derivatives. Cambridge NA Report NA2009/06, University of Cambridge, Cambridge, 26
Rios LM, Sahinidis NV (2013) Derivative-free optimization: a review of algorithms and comparison of software implementations. J Global Optim 56(3):1247–1293
Article MathSciNet Google Scholar
Ruder S (2016) An overview of gradient descent optimization algorithms. arXiv preprintarXiv:1609.04747
Sen Mrinal K, Stoffa Paul L (2013) Global optimization methods in geophysical inversion. Cambridge University Press, Cambridge
Google Scholar
Shahriari B, Swersky K, Wang Z, Adams RP, De Freitas N (2015) Taking the human out of the loop: A review of bayesian optimization. Proc IEEE 104(1):148–175
Article Google Scholar
Snyman JA, Wilke DN (2018) Line search descent methods for unconstrained minimization. In: Practical Mathematical Optimization, pp 41–69. Springer
Surjanovic S, Bingham D (2013) Virtual library of simulation experiments: test functions and datasets. Simon Fraser University, Burnaby, BC, Canada. URL www.sfu.ca/ssurjano/optimization.html

Download references

Acknowledgements

We thank a long list of colleagues for their engaging discussions including, but not limited to: Dr. David Schmidt for providing an initial prototype MATLAB implementation upon which our implementation was further enhanced; Dr. Alison Cozad and Dr. Benjamin Sauk for informative discussions concerning the ALAMO software and methodology; Dr. Nikos Ploskas for guidance on DFO conventions and references; Dr. Ashutosh Tewari for informative discussions on Gaussian Process Regression; Dr. Brent Wheelock for stimulating case study discussions. Finally, we wish to thank two anonymous referees whose suggestions helped us improve the quality of this paper.

Author information

Authors and Affiliations

ExxonMobil Technology and Engineering Company, 1545 Route 22 East, Annandale, NJ, 08801, USA
Dimitri J. Papageorgiou & Krishnan Kumaran
Department of Mathematics, KTH Royal Institute of Technology, Lindstedtsvagen 25, 114 28, Stockholm, Sweden
Jan Kronqvist

Authors

Dimitri J. Papageorgiou
View author publications
You can also search for this author in PubMed Google Scholar
Jan Kronqvist
View author publications
You can also search for this author in PubMed Google Scholar
Krishnan Kumaran
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dimitri J. Papageorgiou.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

In this section, we offer detailed supplementary material to give a more holistic picture of our approach. Section 6.1 furnishes detailed information about our benchmark suite of functions. Section 6.2 shows the CPU time per iteration for LineWalker-full as a function of the number N of grid indices. Section 6.3 explains our experiments with ALAMO. Section 6.4 showcases a detailed visual comparison between bayesopt and LineWalker-full.

1.1 Benchmark suite of functions

Table 2 categorizes these test functions based on their shape and number of extrema. Tables 3 and 4 provide the precise analytical form of each test function, along with the set of global minimizers and minimum objective function values.

Table 2 Gallery of test functions. Category: This classification is taken from Surjanovic and Bingham (2013) with some amendments and additions where appropriate. Periodic: There is regularity/frequency to the spacing between the extrema over the entire domain. Nonsmooth: Does not possess continuous derivatives over the domain. All functions are lower semi-continuous, except plateau. The number of local minima (# Min) and maxima (# Max) excludes endpoints

Full size table

Table 3 Functional form of 10 test functions. dejong5 is a 2-dimensional function, which we evaluate on the line segment \(\{(x_1,x_2): x_1=x_2, x_1 \in [-65.536,65.536]\}\). plateau is a 2-dimensional function \(f(\textbf{x})=\sum _{i=1}^2 | \lfloor x_i \rfloor |\), which we evaluate on the 2D line segment with endpoints \(\textbf{x}_1=(-2,-7)\) and \(\textbf{x}_2=(4,5)\) or equivalently on the 2D domain \(\{(x_1,x_2): x_2=2x_1 - 3, x_1 \in [-2,4]\}\)

Full size table

Table 4 Functional form of remaining 10 test functions

Full size table

1.2 CPU time per iteration

While we assume that computation time will be dominated by calls to an expensive simulator or oracle, for sake of completeness, we provide some empirical evidence of LineWalker-full’s computation time. Specifically, using the Shekel function, we varied the grid size N from 1k to 14k points. For each value of N, we took 50 samples, of which the first 11 are taken before the main loop. Figure 8 shows that, for \(N=\)5k, it takes LineWalker-full roughly 0.5 s, and nearly 3 s for \(N=\)10k. The equation in the figure indicates that the CPU time increases at a cubic rate in N, which is not surprising since LineWalker-full must solve a linear system whose computational complexity \(O(N^3)\).

1.3 ALAMO

In our experiments with ALAMO version 2021.12.28, we permitted ALAMO to include the following functions in its surrogate model construction: constant; linear; logarithmic; exponential; sine; cosine; monomials with powers 0.5, 2, 3, 4, and 5; pairwise combinations of basis functions (see line “multi2power 2 3 4”); and Gaussian radial basis functions. Out of fairness, we did not include any custom basis functions as we attempted to mimic the likely assumptions an agnostic user with a truly unknown black box function would do with ALAMO. We provided ALAMO with 11 initial samples (the same points given to all methods) from which to begin the surrogate construction. We minimized mean square error (MSE) so that ALAMO could perform well in the RMSE metric even though this might lead to surrogates with a larger number of basis functions. Figure 9 shows an example ALAMO input file for grlee12Step.

Critical to ALAMO’s exploration is the choice of adaptive sampling technique. We selected the popular DFO method SNOBFIT rather than a random sampler to avoid stochasticity and having to average results. While this pairing of ALAMO and SNOBFIT worked reasonably well for eight out of the first twelve functions, four functions - ackley, dejong5, langer, and schwefel - caused issues that we do not understand and could not resolve. Specifically, we could not force SNOBFIT to make additional samples for these four functions. With no additional samples, the surrogate quality stagnated as no improvements were made based on additional information. Since we could not resolve this issue (even after trying different objective function criteria, e.g., BIC and AICc), we chose not to compare against ALAMO for these functions. Finally, for reasons that we do not understand, we could not force SNOBFIT to evaluate only one new sample point in each iteration even after setting the ALAMO parameter maxpoints to 1. As a consequence, we permitted ALAMO to have more samples than the other methods to construct its surrogates. This explains why in Fig. 6, ALAMO performed 22, 34, and 43 function evaluations when the other methods were given a strict limit of 20, 30, and 40 function evaluations, respectively.

1.4 LineWalker vs. bayesopt: visual comparison of each benchmark function approximation

Figures 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28 and 29 provide a tantalizing visual comparison between LineWalker-full and bayesopt for all 20 functions given a budget \(E^{\max ,\text {total}}\) of 20, 30, 40, and 50 function evaluations. No legend is given for the LineWalker-full figures; see the Fig. 1 legend for details. The bayesopt legend requires some explanation. Because the objective function is noise-free (i.e., deterministic), the “Model mean” and “Noise error bars” coincide and represent the mean of the GPR posterior distribution. Meanwhile, the “Model error bars” show the 95% confidence bounds for the posterior mean.

In addition to our motivating example shown in Fig. 6LineWalker-full produces a much better surrogate than bayesopt on several functions:

Eason-Schaffer2A (Fig. 13): Along the long plateau in the interval [0, 24], bayesopt produces a surrogate that looks far more like a “sagging electric cable transmission line” than a straight line fit.
Grimacy & Lee (Fig. 15): bayesopt never resolves the local maximum at \(x \approx 0.7\) such that, even with 50 function evaluations, the mean predicted objective value at this point according to the posterior distribution is 7, not 6. LineWalker-full resolves this local maximum.
Holder (Fig. 16): LineWalker-full is far more successful at finding many of the function’s peaks than bayesopt.
Langer (Fig. 17): LineWalker-full produces a more accurate approximation than bayesopt at the local extrema near \(x=0.5\) (grid index 300) and \(x=3.75\) (grid index 1750), and in the interval [6.5, 7.5].
Levy13 (Fig. 20): When the budget \(E^{\max ,\text {total}} \le 40\), bayesopt underestimates the local maximum around \(x=-2.25\) (grid index 700) and, with \(E^{\max ,\text {total}} \le 20\), struggles in the interval \([-1,2]\).
SawtoothD (Fig. 24): bayesopt produces a surrogate that looks more like a “sagging electric cable transmission line” relative to the true function and what LineWalker-full generates.
Shekel (Fig. 27): Although bayesopt finds a near global minimum in 20 samples while LineWalker-full, its surrogate is inferior to that of LineWalker-full when \(E^{\max ,\text {total}} \in \{30,40\}\).

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Papageorgiou, D.J., Kronqvist, J. & Kumaran, K. Linewalker: line search for black box derivative-free optimization and surrogate model construction. Optim Eng (2024). https://doi.org/10.1007/s11081-023-09879-9

Download citation

Received: 19 July 2023
Revised: 22 December 2023
Accepted: 22 December 2023
Published: 21 February 2024
DOI: https://doi.org/10.1007/s11081-023-09879-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Linewalker: line search for black box derivative-free optimization and surrogate model construction

Abstract

Access this article

Similar content being viewed by others

Data-driven distributionally robust optimization using the Wasserstein metric: performance guarantees and tractable reformulations

Overfitting, Model Tuning, and Evaluation of Prediction Performance

The Frank-Wolfe Algorithm: A Short Introduction

References

Acknowledgements