Skip to main content
Log in

Linewalker: line search for black box derivative-free optimization and surrogate model construction

  • Research Article
  • Published:
Optimization and Engineering Aims and scope Submit manuscript

Abstract

This paper describes a simple, but effective sampling method for optimizing and learning a discrete approximation (or surrogate) of a multi-dimensional function along a one-dimensional line segment of interest. The method does not rely on derivative information and the function to be learned can be a computationally-expensive “black box” function that must be queried via simulation or other means. It is assumed that the underlying function is noise-free and smooth, although the algorithm can still be effective when the underlying function is piecewise smooth. The method constructs a smooth surrogate on a set of equally-spaced grid points by evaluating the true function at a sparse set of judiciously chosen grid points. At each iteration, the surrogate’s non-tabu local minima and maxima are identified as candidates for sampling. Tabu search constructs are also used to promote diversification. If no non-tabu extrema are identified, a simple exploration step is taken by sampling the midpoint of the largest unexplored interval. The algorithm continues until a user-defined function evaluation limit is reached. Numerous examples are shown to illustrate the algorithm’s efficacy and superiority relative to state-of-the-art methods, including Bayesian optimization and NOMAD, on primarily nonconvex test functions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Algorithm 1
Fig. 1
Fig. 2
Algorithm 2
Algorithm 3
Algorithm 4
Algorithm 5
Algorithm 6
Algorithm 7
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  • Bazaraa Mokhtar S, Sherali Hanif D, Shetty Chitharanjan M (2006) Nonlinear programming: theory and algorithms, 3rd edn. Wiley-Interscience, New Jersey

    Book  Google Scholar 

  • Bergou EH, Diouane Y, Gratton S (2018) A line-search algorithm inspired by the adaptive cubic regularization framework and complexity analysis. J Optim Theory Appl 178(3):885–913

    Article  MathSciNet  Google Scholar 

  • Bergou EH, Diouane Y, Kunc V, Kungurtsev V, Royer CW (2022) A subsampling line-search method with second-order results. INFORMS J Optim 4(4):403–425

    Article  MathSciNet  Google Scholar 

  • Dimitri B (1999) Nonlinear Programming, 2nd edn. Athena Scientific, USA

    Google Scholar 

  • Bhosekar A, Ierapetritou M (2018) Advances in surrogate based modeling, feasibility analysis, and optimization: a review. Comput Chem Eng 108:250–267

    Article  CAS  Google Scholar 

  • Brochu E, Cora VM, De Freitas N (2010) A tutorial on bayesian optimization of expensive cost functions, with application to active user modeling and hierarchical reinforcement learning. arXiv preprintarXiv:1012.2599

  • Chae Y, Wilke DN (2019) Empirical study towards understanding line search approximations for training neural networks. arXiv preprintarXiv:1909.06893

  • Conn AR, Scheinberg K, Vicente LN (2009) Introduction to derivative-free optimization. SIAM

  • Constable SC, Parker RL, Constable CG (1987) Occam’s inversion: a practical algorithm for generating smooth models from electromagnetic sounding data. Geophysics 52(3):289–300

    Article  ADS  Google Scholar 

  • Costa A, Nannicini G (2018) Rbfopt: an open-source library for black-box optimization with costly function evaluations. Math Program Comput 10(4):597–629

    Article  MathSciNet  Google Scholar 

  • Cozad A, Sahinidis NV, Miller DC (2014) Learning surrogate models for simulation-based optimization. AIChE J 60(6):2211–2227

    Article  ADS  CAS  Google Scholar 

  • del Rio Chanona EA, Petsagkourakis P, Bradford E, Graciano JEA, Chachuat B (2021) Real-time optimization meets bayesian optimization and derivative-free optimization: a tale of modifier adaptation. Comput Chem Eng 147:107249

    Article  Google Scholar 

  • Fred G, Manuel L (1998) Tabu search. Handbook of combinatorial optimization. Springer, Berlin, pp 2093–2229

    Google Scholar 

  • Gutmann H-M (2001) A radial basis function method for global optimization. J Global Optim 19(3):201–227

    Article  MathSciNet  Google Scholar 

  • Trevor H, Robert T, Jerome F (2009) The elements of statistical learning: data mining, inference, and prediction. Springer, Berlin

    Google Scholar 

  • Huyer W, Neumaier A (1999) Global optimization by multilevel coordinate search. J Global Optim 14:331–355

    Article  MathSciNet  Google Scholar 

  • Huyer W, Neumaier A (2008) Snobfit-stable noisy optimization by branch and fit. ACM Trans Math Softw (TOMS) 35(2):1–25

    Article  MathSciNet  Google Scholar 

  • Hyndman Rob J, Koehler Anne B (2006) Another look at measures of forecast accuracy. Int J Forecast 22(4):679–688

    Article  Google Scholar 

  • Lagarias JC, Reeds JA, Wright MH, Wright PE (1998) Convergence properties of the nelder-mead simplex method in low dimensions. SIAM J Optim 9(1):112–147

    Article  MathSciNet  Google Scholar 

  • Lagarias JC, Poonen B, Wright MH (2012) Convergence of the restricted nelder-mead algorithm in two dimensions. SIAM J Optim 22(2):501–532

    Article  MathSciNet  Google Scholar 

  • Larson J, Menickelly M, Wild SM (2019) Derivative-free optimization methods. Acta Numer 28:287–404

    Article  MathSciNet  Google Scholar 

  • Le Digabel S (2011) Algorithm 909: NOMAD: nonlinear optimization with the MADS algorithm. ACM Trans Math Softw 37(4):1–15

    Article  MathSciNet  Google Scholar 

  • Mahsereci M, Hennig P (2015) Probabilistic line searches for stochastic optimization. Adv Neural Inf Process Syst 28

  • Müller J (2016) MISO: mixed-integer surrogate optimization framework. Optim Eng 17(1):177–203

    Article  MathSciNet  Google Scholar 

  • Nelder John A, Roger M (1965) A simplex method for function minimization. Comput J 7(4):308–313

    Article  MathSciNet  Google Scholar 

  • Neumaier A, Azmi B (2019) Line search and convergence in bound-constrained optimization. Technical report, Technical report, University of Vienna

  • Jorge N, Stephen W (2006) Numerical optimization. Springer, Berlin

    Google Scholar 

  • Ong YS, Nair PB, Keane AJ, Wong KW (2005) Surrogate-assisted evolutionary optimization frameworks for high-fidelity engineering design problems. Know Incorp Evolut Comput, pp 307–331

  • Paquette C, Scheinberg K (2020) A stochastic line search method with expected complexity analysis. SIAM J Optim 30(1):349–376

    Article  MathSciNet  Google Scholar 

  • Nikolaos P, Sahinidis Nikolaos V (2021) Review and comparison of algorithms and software for mixed-integer derivative-free optimization. J Global Optim 82:1–30

    MathSciNet  Google Scholar 

  • Powell MJD et al (2009) The bobyqa algorithm for bound constrained optimization without derivatives. Cambridge NA Report NA2009/06, University of Cambridge, Cambridge, 26

  • Rios LM, Sahinidis NV (2013) Derivative-free optimization: a review of algorithms and comparison of software implementations. J Global Optim 56(3):1247–1293

    Article  MathSciNet  Google Scholar 

  • Ruder S (2016) An overview of gradient descent optimization algorithms. arXiv preprintarXiv:1609.04747

  • Sen Mrinal K, Stoffa Paul L (2013) Global optimization methods in geophysical inversion. Cambridge University Press, Cambridge

    Google Scholar 

  • Shahriari B, Swersky K, Wang Z, Adams RP, De Freitas N (2015) Taking the human out of the loop: A review of bayesian optimization. Proc IEEE 104(1):148–175

    Article  Google Scholar 

  • Snyman JA, Wilke DN (2018) Line search descent methods for unconstrained minimization. In: Practical Mathematical Optimization, pp 41–69. Springer

  • Surjanovic S, Bingham D (2013) Virtual library of simulation experiments: test functions and datasets. Simon Fraser University, Burnaby, BC, Canada. URL www.sfu.ca/ssurjano/optimization.html

Download references

Acknowledgements

We thank a long list of colleagues for their engaging discussions including, but not limited to: Dr. David Schmidt for providing an initial prototype MATLAB implementation upon which our implementation was further enhanced; Dr. Alison Cozad and Dr. Benjamin Sauk for informative discussions concerning the ALAMO software and methodology; Dr. Nikos Ploskas for guidance on DFO conventions and references; Dr. Ashutosh Tewari for informative discussions on Gaussian Process Regression; Dr. Brent Wheelock for stimulating case study discussions. Finally, we wish to thank two anonymous referees whose suggestions helped us improve the quality of this paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dimitri J. Papageorgiou.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

In this section, we offer detailed supplementary material to give a more holistic picture of our approach. Section 6.1 furnishes detailed information about our benchmark suite of functions. Section 6.2 shows the CPU time per iteration for LineWalker-full as a function of the number N of grid indices. Section 6.3 explains our experiments with ALAMO. Section 6.4 showcases a detailed visual comparison between bayesopt and LineWalker-full.

1.1 Benchmark suite of functions

Table 2 categorizes these test functions based on their shape and number of extrema. Tables 3 and 4 provide the precise analytical form of each test function, along with the set of global minimizers and minimum objective function values.

Table 2 Gallery of test functions. Category: This classification is taken from Surjanovic and Bingham (2013) with some amendments and additions where appropriate. Periodic: There is regularity/frequency to the spacing between the extrema over the entire domain. Nonsmooth: Does not possess continuous derivatives over the domain. All functions are lower semi-continuous, except plateau. The number of local minima (# Min) and maxima (# Max) excludes endpoints
Table 3 Functional form of 10 test functions. dejong5 is a 2-dimensional function, which we evaluate on the line segment \(\{(x_1,x_2): x_1=x_2, x_1 \in [-65.536,65.536]\}\). plateau is a 2-dimensional function \(f(\textbf{x})=\sum _{i=1}^2 | \lfloor x_i \rfloor |\), which we evaluate on the 2D line segment with endpoints \(\textbf{x}_1=(-2,-7)\) and \(\textbf{x}_2=(4,5)\) or equivalently on the 2D domain \(\{(x_1,x_2): x_2=2x_1 - 3, x_1 \in [-2,4]\}\)
Table 4 Functional form of remaining 10 test functions

1.2 CPU time per iteration

While we assume that computation time will be dominated by calls to an expensive simulator or oracle, for sake of completeness, we provide some empirical evidence of LineWalker-full’s computation time. Specifically, using the Shekel function, we varied the grid size N from 1k to 14k points. For each value of N, we took 50 samples, of which the first 11 are taken before the main loop. Figure 8 shows that, for \(N=\)5k, it takes LineWalker-full roughly 0.5 s, and nearly 3 s for \(N=\)10k. The equation in the figure indicates that the CPU time increases at a cubic rate in N, which is not surprising since LineWalker-full must solve a linear system whose computational complexity \(O(N^3)\).

Fig. 8
figure 8

CPU time [s] per iteration for LineWalker-full as a function of the number N of grid indices. The polynomial fit equation suggests a cubic relationship in the CPU time per iteration (y) and the number of grid indices (x)

Fig. 9
figure 9

ALAMO input file for the test function grlee12Step

1.3 ALAMO

In our experiments with ALAMO version 2021.12.28, we permitted ALAMO to include the following functions in its surrogate model construction: constant; linear; logarithmic; exponential; sine; cosine; monomials with powers 0.5, 2, 3, 4, and 5; pairwise combinations of basis functions (see line “multi2power 2 3 4”); and Gaussian radial basis functions. Out of fairness, we did not include any custom basis functions as we attempted to mimic the likely assumptions an agnostic user with a truly unknown black box function would do with ALAMO. We provided ALAMO with 11 initial samples (the same points given to all methods) from which to begin the surrogate construction. We minimized mean square error (MSE) so that ALAMO could perform well in the RMSE metric even though this might lead to surrogates with a larger number of basis functions. Figure 9 shows an example ALAMO input file for grlee12Step.

Critical to ALAMO’s exploration is the choice of adaptive sampling technique. We selected the popular DFO method SNOBFIT rather than a random sampler to avoid stochasticity and having to average results. While this pairing of ALAMO and SNOBFIT worked reasonably well for eight out of the first twelve functions, four functions - ackley, dejong5, langer, and schwefel - caused issues that we do not understand and could not resolve. Specifically, we could not force SNOBFIT to make additional samples for these four functions. With no additional samples, the surrogate quality stagnated as no improvements were made based on additional information. Since we could not resolve this issue (even after trying different objective function criteria, e.g., BIC and AICc), we chose not to compare against ALAMO for these functions. Finally, for reasons that we do not understand, we could not force SNOBFIT to evaluate only one new sample point in each iteration even after setting the ALAMO parameter maxpoints to 1. As a consequence, we permitted ALAMO to have more samples than the other methods to construct its surrogates. This explains why in Fig. 6, ALAMO performed 22, 34, and 43 function evaluations when the other methods were given a strict limit of 20, 30, and 40 function evaluations, respectively.

1.4 LineWalker vs. bayesopt: visual comparison of each benchmark function approximation

Figures 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28 and 29 provide a tantalizing visual comparison between LineWalker-full and bayesopt for all 20 functions given a budget \(E^{\max ,\text {total}}\) of 20, 30, 40, and 50 function evaluations. No legend is given for the LineWalker-full figures; see the Fig. 1 legend for details. The bayesopt legend requires some explanation. Because the objective function is noise-free (i.e., deterministic), the “Model mean” and “Noise error bars” coincide and represent the mean of the GPR posterior distribution. Meanwhile, the “Model error bars” show the 95% confidence bounds for the posterior mean.

In addition to our motivating example shown in Fig. 6LineWalker-full produces a much better surrogate than bayesopt on several functions:

  • Eason-Schaffer2A (Fig. 13): Along the long plateau in the interval [0, 24], bayesopt produces a surrogate that looks far more like a “sagging electric cable transmission line” than a straight line fit.

  • Grimacy & Lee (Fig. 15): bayesopt never resolves the local maximum at \(x \approx 0.7\) such that, even with 50 function evaluations, the mean predicted objective value at this point according to the posterior distribution is 7, not 6. LineWalker-full resolves this local maximum.

  • Holder (Fig. 16): LineWalker-full is far more successful at finding many of the function’s peaks than bayesopt.

  • Langer (Fig. 17): LineWalker-full produces a more accurate approximation than bayesopt at the local extrema near \(x=0.5\) (grid index 300) and \(x=3.75\) (grid index 1750), and in the interval [6.5, 7.5].

  • Levy13 (Fig. 20): When the budget \(E^{\max ,\text {total}} \le 40\), bayesopt underestimates the local maximum around \(x=-2.25\) (grid index 700) and, with \(E^{\max ,\text {total}} \le 20\), struggles in the interval \([-1,2]\).

  • SawtoothD (Fig. 24): bayesopt produces a surrogate that looks more like a “sagging electric cable transmission line” relative to the true function and what LineWalker-full generates.

  • Shekel (Fig. 27): Although bayesopt finds a near global minimum in 20 samples while LineWalker-full, its surrogate is inferior to that of LineWalker-full when \(E^{\max ,\text {total}} \in \{30,40\}\).

Fig. 10
figure 10

ackley. Left column = bayesopt. Right column = LineWalker-full

Fig. 11
figure 11

damped harmonic oscillator. Left column = bayesopt. Right column = LineWalker-full

Fig. 12
figure 12

dejong5. Left column = bayesopt. Right column = LineWalker-full

Fig. 13
figure 13

easom schaffer2A. Left column = bayesopt. Right column = LineWalker-full

Fig. 14
figure 14

egg2. Left column = bayesopt. Right column = LineWalker-full

Fig. 15
figure 15

grlee12Step. Left column = bayesopt. Right column = LineWalker-full

Fig. 16
figure 16

holder. Left column = bayesopt. Right column = LineWalker-full

Fig. 17
figure 17

langer. Left column = bayesopt. Right column = LineWalker-full

Fig. 18
figure 18

langer2. Left column = bayesopt. Right column = LineWalker-full

Fig. 19
figure 19

levy. Left column = bayesopt. Right column = LineWalker-full

Fig. 20
figure 20

levy13. Left column = bayesopt. Right column = LineWalker-full

Fig. 21
figure 21

michal. Left column = bayesopt. Right column = LineWalker-full

Fig. 22
figure 22

plateau. Left column = bayesopt. Right column = LineWalker-full

Fig. 23
figure 23

rastr. Left column = bayesopt. Right column = LineWalker-full

Fig. 24
figure 24

sawtoothD. Left column = bayesopt. Right column = LineWalker-full

Fig. 25
figure 25

schaffer2A. Left column = bayesopt. Right column = LineWalker-full

Fig. 26
figure 26

schwef. Left column = bayesopt. Right column = LineWalker-full

Fig. 27
figure 27

shekel. Left column = bayesopt. Right column = LineWalker-full

Fig. 28
figure 28

stybtang. Left column = bayesopt. Right column = LineWalker-full

Fig. 29
figure 29

zakharov. Left column = bayesopt. Right column = LineWalker-full

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Papageorgiou, D.J., Kronqvist, J. & Kumaran, K. Linewalker: line search for black box derivative-free optimization and surrogate model construction. Optim Eng (2024). https://doi.org/10.1007/s11081-023-09879-9

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11081-023-09879-9

Keywords

Navigation