Abstract
Adaptive random search approaches have been shown to be effective for global optimization problems, where under certain conditions, the expected performance time increases only linearly with dimension. However, previous analyses assume that the objective function can be observed directly. We consider the case where the objective function must be estimated, often using a noisy function, as in simulation. We present a finite-time analysis of algorithm performance that combines estimation with a sampling distribution. We present a framework called Hesitant Adaptive Search with Estimation, and derive an upper bound on function evaluations that is cubic in dimension, under certain conditions. We extend the framework to Quantile Adaptive Search with Estimation, which focuses sampling points from a series of nested quantile level sets. The analyses suggest that computational effort is better expended on sampling improving points than refining estimates of objective function values during the progress of an adaptive search algorithm.
Similar content being viewed by others
Data Availibility
Data sharing is not applicable to this article as no datasets were generated or analyzed during the current study.
References
Andradóttir, S., Prudius, A.A.: Adaptive random search for continuous simulation optimization. Naval Res. Logist. 57, 583–604 (2010)
Baritompa, William P., Bulger, David W., Wood, Graham R.: Generating functions and the performance of backtracking adaptive search. J. Glob. Optim. 37(2), 159–175 (2007)
Guus, C., Boender, E., Edwin Romeijn, H.: Stochastic methods. In Handbook of global optimization, pages 829–869. Springer: Berlin. 1995
Bulger, David W., Wood, Graham R.: Hesitant adaptive search for global optimisation. Math. Prog. 81(1), 89–102 (1998)
Michael, CFu.: Handbook of Simulation Optimization, vol. 216. Springer, New York, New York, NY (2015)
Ho, Y.C., Cassandras, C.G., Chen, C.H., Dai, L.: Ordinal optimisation and simulation. J. Oper. Res. Soc. 51, 490–500 (2000)
Ho, Y.C., Zhao, Q.C., Jia, Q.S.: Ordinal optimization: soft optimization for hard problems. Springer, Berlin, Germany (2007)
Hu, Jiaqiao., Wang, Yongqiang., Zhou, Enlu., Fu, Michael C., Marcus, Steven I.: A survey of some model-based methods for global optimization. In Optimization, Control, and Applications of Stochastic Systems, pages 157–179. Birkhäuser Boston, 2012
Jinyang, J., Hu, J., Peng, Y.: Quantile-based policy optimization for reinforcement learning, 2022. available on arXiv:2201.11463
Kendall, Maurice G.: A Course in the Geometry of n Dimensions. Courier Corporation, (2004)
Kiatsupaibul, S., Smith, R.L., Zabinsky, Z.B.: Single observation adaptive search for continuous simulation. Oper. Res. 66, 1713–1727 (2018)
Kiatsupaibul, S., Smith, R.L., Zabinsky, Z.B.: Single observation adaptive search for discrete and continuous simulation. Oper. Res. Lett. 48, 666–673 (2020)
Kleywegt, A.J., Shapiro, A., Homem-de Mello, T.: The sample average approximation method for stochastic discrete optimization. SIAM J. Optim. 12, 479–502 (2002)
Linz, David D.: Optimizing population healthcare resource allocation under uncertainty using global optimization methods. University of Washington Dissertation, (2018)
Linz, D.D., Zabinsky, Z.B., Kiatsupaibul, S, Smith, R.L.: A computational comparison of simulation optimization methods using single observations within a shrinking ball on noisy black-box functions with mixed integer and continuous domains. In Chan, W.K.V., D’Ambrogio, A., Zacharewicz, G., Mustafee, N., Wainer, G., Page, E., (eds.) Proceedings of the 2017 Winter Simulation Conference, pages 2045 – 2056, Washington, DC, 2017
Locatelli, M., Schoen, F.: Global optimization: theory, algorithms, and applications, volume 15. SIAM, 2013
Locatelli, Marco, Schoen, Fabio: (Global) optimization: historical notes and recent developments. EURO J. Comput. Optim. 9, 100012 (2021)
Pardalos, Panos M., Edwin Romeijn, H., Tuy, Hoang: Recent developments and trends in global optimization. J. Comput. Appl. Math. 124(1–2), 209–228 (2000)
Raphael, Benny, Smith, Ian F. C.: A direct stochastic algorithm for global search. Appl. Math. Comput. 146(2–3), 729–758 (2003)
Raphael, Benny., Smith, Ian F. C.: Global search through sampling using a PDF. In Stochastic Algorithms: Foundations And Applications, volume 2827, pages 71–82. Springer (2003)
Edwin Romeijn, H., Smith, Robert L.: Simulated annealing and adaptive search in global optimization. Prob. Eng. Inf. Sci. 8(4), 571–590 (1994)
Rubinstein, Reuven Y., Kroese, Dirk P.: The Cross-Entropy Method: A Unified Approach to Combinatorial Optimization. Monte-Carlo Simulation and Machine Learning. Springer, Cambridge, UK (2004)
Shen, Yanfang.: Annealing Adaptive Search With Hit-and-Run Sampling Methods for Global Optimization. University of Washington Dissertation (2005)
Shen, Yanfang, Kiatsupaibul, Seksan, Zabinsky, Zelda B., Smith, Robert L.: An analytically derived cooling schedule for simulated annealing. J. Glob. Optim. 38(3), 333–365 (2007)
Wood, Graham R., Bulger, David W., Baritompa, William P., Alexander, D.: Backtracking adaptive search: distribution of number of iterations to convergence. J. Optim. Theory Appl. 128(3), 547–562 (2006)
Wood, Graham R., Zabinsky, Zelda B., Kristinsdottir, Birna P.: Hesitant adaptive search: the distribution of the number of iterations to convergence. Math. Progr. 89(3), 479–486 (2001)
Zabinsky, Zelda B.: Stochastic adaptive search for global optimization. Kluwer Academic Publishers originally, Springer Science & Business Media (2003)
Zabinsky, Zelda B., Bulger, David, Khompatraporn, Charoenchai: Stopping and restarting strategy for stochastic sequential search in global optimization. J. Glob. Optim. 46, 273–286 (2010)
Zabinsky, Zelda B., Huang, Hao: A partition-based optimization approach for level set approximation: Probabilistic branch and bound. In: Alice, S. (ed.) Women in Industrial and Systems Engineering: Key Advances and Perspectives on Emerging Topics. Springer, Berlin (2020)
Zabinsky, Zelda B., Smith, Robert L.: Pure adaptive search in global optimization. Math. Progr. 53(1–3), 323–338 (1992)
Zabinsky, Zelda B., Wood, Graham R., Steel, Mike A., Baritompa, William P.: Pure adaptive search for finite global optimization. Math. Progr. 69(1–3), 443–448 (1995)
Acknowledgements
This research has been supported in part by the National Science Foundation, Grant CMMI-1935403.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix A Proofs of Theorems for HAS-E Analysis
1.1 A.1 Proof of Theorem 1
Proof of Theorem 1
For any value \(y_k\) such that \(y_* + \epsilon < y_k \le y^*\), we start by defining an n-ball \({\mathcal {B}}_{y_k} \) as the largest n-ball centered at \(x_*\) such that \({\mathcal {B}}_{y_k} \subseteq S_{y_k}\) and let \(r_{y_k}\) be its radius. We note that \(0 < \nu ({\mathcal {B}}_{y_k}) \le \nu (S_{y_k}) \). For any value \({\hat{y}}_{k}^{high}\), we define \({\mathcal {B}}_{{\hat{y}}_{k}^{high}}\) as the smallest n-ball centered at \(x_*\) such that \(S_{{\hat{y}}_{k}^{high} } \subseteq {\mathcal {B}}_{{\hat{y}}_{k}^{high} }\) and let \(r_{{\hat{y}}_{k}^{high}}\) be the radius of \({\mathcal {B}}_{{\hat{y}}_{k}^{high} }\).
We examine two cases. First, if \({\hat{y}}_k^{high} - y_{k} \le \kappa _{q}\) then \(\frac{\nu (S_{y_{k}})}{\nu ( S_{{\hat{y}}_k^{high}})} > q \) by definition in (11), and the theorem is proved.
Second, consider \({\hat{y}}_k^{high} - y_{k} > \kappa _{q}\). We define \({\mathcal {K}}_{cone} = \frac{{\hat{y}}_{k}^{high} - y_k}{ r_{{\hat{y}}_{k}^{high}} - r_{y_k} }\), which can be interpreted as the slope that connects the two balls, see Fig. 5. We also write
Since \({\hat{y}}_k^{high} - y_{k} > \kappa _{q}\), the numerator of \({\mathcal {K}}_{cone} \) is greater than the numerator of \({\mathcal {K}}_{q}\) as in (11), and, since \(d >r_{{\hat{y}}_{k}^{high}} - r_{y_k} \) by definition of the diameter, we have \({\mathcal {K}}_{cone} > {\mathcal {K}}_{q}\). Note that \({\mathcal {K}}_{q}\) is independent of the value \(y_k\).
We define \({\mathcal {B}}^{large} \) as an n-ball centered at \(x_*\) with radius \(r_{large} \), where \( r_{large} = r_{y_k} + ({\hat{y}}_{k}^{high} - y_k) / {\mathcal {K}}_{q} \). Here we see that \(r_{{\hat{y}}_{k}^{high}} \le r_{large} \) since \({\mathcal {K}}_{q} \le {\mathcal {K}}_{cone}\). Therefore \(S_{{\hat{y}}_{k}^{high} } \subset {\mathcal {B}}_{{\hat{y}}_{k}^{high} } \subset {\mathcal {B}}^{large} \), as illustrated in Fig. 5.
A lower bound on the ratios of volumes is constructed in terms of the dimension n, using multi-dimensional geometry theorems [10],
Since \( {\hat{y}}_k^{high} - y_k \le \frac{ 2 \cdot \sigma \cdot z_{\alpha /2}}{\sqrt{R}}\), as given in the theorem statement, we have the following lower bound,
We want to determine R, such that
Through several algebraic manipulations to isolate R, we determine that (13) holds if \(R \ge \left( \frac{ \root n \of {q} \cdot 2 \cdot \sigma \cdot z_{\alpha /2} }{(1 - \root n \of {q}) \cdot r_{y_k} \cdot {\mathcal {K}}_{q} } \right) ^2\).
Finally, since \(y_k \ge y_*+\epsilon \), then \(r_{y_k} \ge r_{y_*+\epsilon }\), and therefore (13) holds if
which proves Theorem 1. \(\blacksquare \)
1.2 A.2 Lemmas
The bound on replications, which is quadratic in dimension given in (14), uses a bound stated in Lemma 1. Theorems 2 and 4 make use of Lemma 30 from [23], which is repeated here for convenience as Lemma 2.
Lemma 1
For a given constant a such that \(0< a < 1\), and a variable \(n \ge 1\), then the function \(f(n)= \frac{a^{1/n}}{1-a^{1/n}}\) is bounded by a linear function of n, that is,
A proof of Lemma 1 is omitted. Other bounds are possible that are still linear in n.
The following lemma is used in the proof of Theorem 3.
Lemma 2
(cf. [23]) Let \( {{\bar{Y}}_k^A, k = 0, 1, 2, \ldots }\) and \( {{\bar{Y}}_k^B, k = 0, 1, 2,\ldots } \) be two sequences of objective function values generated by algorithms A and B respectively for solving a minimization problem, such that \({\bar{Y}}_{k+1}^A \le {\bar{Y}}_k^A\) and \({\bar{Y}}_{k+1}^B \le {\bar{Y}}_k^B\) for \(k = 0, 1, \ldots \). For \(y_* < y,z \le y^*\) and \(k = 0,1, \ldots \), if
-
1.
\(P({\bar{Y}}_{k+1}^A \le y | {\bar{Y}}_k^A =z ) \ge P({\bar{Y}}_{k+1}^B\le y | {\bar{Y}}_k^B =z ) \)
-
2.
\(P({\bar{Y}}_{k+1}^A \le y | {\bar{Y}}_k^A =z )\) is non-increasing in z, and
-
3.
\( P({\bar{Y}}_{0}^A \le y ) \ge P({\bar{Y}}_{0}^B \le y )\)
then \(P(Y_{k}^A \le y ) \ge P(Y_{k}^B \le y)\) for \(k=0,1, \ldots \) and \(y_* < y \le y^*\).
A proof of Lemma 2 can be found in [23].
1.3 A.3 Proof of Theorem 2
Proof of Theorem 2:
Using the notation in HAS-E on the kth iteration, and based on Lemma 2, as in [23], if the following conditions hold for \(y_* < y, {\bar{y}}_k \le y^*\) and \(k = 0,1, \ldots \),
-
(I)
\(P({\bar{Y}}_{k+1}^{HASE} \le y | {\bar{Y}}_k^{HASE} ={\bar{y}}_k ) \ge P({\bar{Y}}_{k+1}^{HAS1} \le y | {\bar{Y}}_k^{HAS1} ={\bar{y}}_k ) \)
-
(II)
\(P({\bar{Y}}_{k+1}^{HASE} \le y | {\bar{Y}}_k^{HASE} ={\bar{y}}_k ) \) is non-increasing in \({\bar{y}}_k \), and
-
(III)
\( P({\bar{Y}}_{0}^{HASE} \le y) \ge P({\bar{Y}}_{0}^{HAS1} \le y )\)
then \(P({\bar{Y}}_k^{HASE} \le y ) \ge P({\bar{Y}}_k^{HAS1} \le y ) \text { for } k = 0,1, \ldots \) and for \(y_*< y \le y^*\).
The first step is to prove (I). When \(y \ge {\bar{y}}_k\), (I) is true trivially (since the conditional probability equals one on both sides). Now, when \(y < {\bar{y}}_k\), we bound the left-hand side of the expression in (I), as,
where we condition on the event that HASE “betters”, that is, that HASE samples from the normalized restriction of \(\zeta \) on \(S_{{\bar{y}}_k^{high}}\), and consequently \({\bar{Y}}_{k+1}^{HASE} \le {\bar{y}}_k^{high}\), which occurs with probability at least \(\gamma \) by the bound on the bettering probability in Assumption 1 (ii).
We next consider the event \(\{ {\bar{y}}_k \le {\bar{y}}_k^{high} \}\), which occurs with probability at least \(1-\alpha \) by (8). We rewrite (30) as,
dropping the condition \({\bar{Y}}_{k}^{HASE} = {\bar{y}}_k\) because it is captured in the other conditions.
To further develop the bound in terms of HAS1, we note that \(P({\bar{Y}}_{k+1}^{HASE} \le y) \ge P({\bar{Y}}_{0}^{HASE} \le y) \ge P({\bar{Y}}_{0}^{HAS1} \le y)\) by Assumption 1 (i), and since \(P({\bar{Y}}_{k+1}^{HASE} \le y) = P({\bar{Y}}_{k+1}^{HASE} \le y \bigcap {\bar{Y}}_{k+1}^{HASE} \le {\bar{y}}_k^{high})\) and \(P({\bar{Y}}_{0}^{HAS1} \le y) = P({\bar{Y}}_{0}^{HAS1} \le y \bigcap {\bar{Y}}_{0}^{HAS1} \le {\bar{y}}_k^{high})\) we can write
implying
and
We now rewrite (31) as
Therefore, we can create a lower bound for (32):
The last inequality makes use of the lower bound developed in Theorem 1, \( \frac{\nu (S_{{\bar{y}}_k})}{ \nu (S_{{\bar{y}}_k^{high}} ) } \ge q \).
We similarly expand the expression for HAS1 in the right-hand side of (I), noting that HAS1 either improves or stays where it is, yielding,
where \({X}_{k+1}^{HAS1}\) is sampled according to the normalized restriction of the uniform distribution on the improving level set. Combining this with the bettering probability of HAS1, \(b(y)=\gamma \cdot (1-\alpha ) \cdot q\), and when HAS1 “betters”, we have,
Combining (33) and (34) proves condition (I).
We go on to prove (II), that \( P({\bar{Y}}_{k+1}^{HASE} \le y | {\bar{Y}}_k^{HASE} = {\bar{y}}_k ) \) is non-increasing in \({\bar{y}}_k\). Suppose that \({\bar{y}}_k\) and \({\bar{y}}_k'\) are such that \({\bar{y}}_k < {\bar{y}}_k'\). To show (II) we want to show that:
The approach is to condition on the value of \({\bar{y}}_k^{high}\), and since HAS-E samples on \(S_{{\bar{y}}_k^{high}}\) in Step 2 of the algorithm, we know that \(P({\bar{Y}}_{k+1}^{HASE} \le y | {\bar{Y}}_k^{HASE} = {\bar{y}}_k, {\bar{y}}_k^{high} = u )\) is non-increasing, therefore, we have,
and because \(\int _{-\infty }^{z} dP\left( {\bar{Y}}_{k+1}^{HASE} \le y | {\bar{Y}}_{k}^{HASE} = {\bar{y}}_k, {\bar{y}}_k^{high} = u \right) =\) \( P\left( {\bar{Y}}_{k+1}^{HASE} \le y | {\bar{Y}}_{k}^{HASE} = {\bar{y}}_k, {\bar{y}}_k^{high} = z \right) - P\left( {\bar{Y}}_{k+1}^{HASE} \le y | {\bar{Y}}_{k}^{HASE} = {\bar{y}}_k, {\bar{y}}_k^{high} = - \infty \right) \), and since \(P\left( {\bar{Y}}_{k+1}^{HASE} \le y | {\bar{Y}}_{k}^{HASE} = {\bar{y}}_k, {\bar{y}}_k^{high} = -\infty \right) = 1\) (trivially), we substitute \( P \left( {\bar{Y}}_{k+1}^{HASE} \le y | {\bar{Y}}_{k}^{HASE} = {\bar{y}}_k, {\bar{y}}_k^{high} = z \right) \) as follows,
and reversing the order of integration, we get
Now to show a lower bound in terms of \({\bar{y}}_k'\). However, since \(dP\left( {\bar{Y}}_{k+1}^{HASE}\right. \)\( \left. \le y | {\bar{Y}}_{k}^{HASE} = {\bar{y}}_k, {\bar{y}}_k^{high} = u \right) \le 0\), and since \(P\left( {\bar{Y}}_{k+1}^{HASE} \le y | {\bar{Y}}_{k}^{HASE} = {\bar{y}}_k, {\bar{y}}_k^{high} = u \right) \) is non-increasing in \({\bar{y}}_k^{high}\), and since, \(P\left( {\bar{y}}_k^{high} \le u | {\bar{Y}}_{k}^{HASE} = {\bar{y}}_k \right) \ge P\left( {\bar{y}}_k^{high} \le u | {\bar{Y}}_{k}^{HASE} = {\bar{y}}_k' \right) \), the probability that \({\bar{y}}_k^{high} \) is lower than u is always greater for \({\bar{y}}_k < {\bar{y}}_k'\), then
which is equivalent to
and reversing the order of integration:
therefore, since \( P({\bar{Y}}_{k+1}^{HASE} \le y | {\bar{Y}}_{k}^{HASE} = {\bar{y}}_k, {\bar{y}}_k^{high} = z ) = P({\bar{Y}}_{k+1}^{HASE} \le y | {\bar{Y}}_{k}^{HASE} = {\bar{y}}_k', {\bar{y}}_k^{high} = z ) \), we write:
which proves (II).
Lastly, condition (III) from Lemma 2 is true by Assumption 1 (i) that \(P({\bar{Y}}_0^{HASE}\le y) \ge P({\bar{Y}}_0^{HAS1} \le y)\). This proves the theorem through reference to Lemma 2. \(\blacksquare \)
1.4 A.4 Proof of Theorem 3
Proof of Theorem 3:
By stochastic dominance in Theorem 2, the expected number of iterations to achieve a value within \(S_{y_* + \epsilon }\) for HAS-E is less than or equal to the number for HAS1. Since the bettering probability for HAS1 is \(b(y) = \gamma \cdot ( 1- \alpha ) \cdot q\) for all \( y_* < y \le y^* \), using (4), we have
and since HAS1 uses uniform sampling, i.e., \( p(y) = \frac{\nu (S_y)}{\nu (S)}\), we have
Using a constant number of replications R for each iteration, yields
Setting \(R_k=R\) as in (16) yields the result in (18). \(\blacksquare \)
Appendix B Proofs of Theorems for QAS-E Analysis
The proofs of the theorems for QAS-E are similar to the proofs for HAS-E. The proof of Theorem 4 for QAS-E is provided for completeness.
Proof of Theorem 4
Similar to the proof of Theorem 2, if the three conditions listed in Lemma 2 hold, then \(P({\bar{Y}}_k^{QASE} \le y ) \ge P({\bar{Y}}_k^{HAS2} \le y ) \text { for } k = 0,1, \ldots \) and for \(y_*< y \le y^*\).
We start by proving the first condition in Lemma 2, that is, we show that
for \(y_* < y, {\bar{y}}_k \le y^*\) and \(k = 0,1, \ldots \). When \(y \ge {\bar{y}}_k\), \(P({\bar{Y}}_{k+1}^{QASE} \le y | {\bar{Y}}_k^{QASE} ={\bar{y}}_k ) = P({\bar{Y}}_{k+1}^{HAS2} \le y | {\bar{Y}}_k^{QASE} ={\bar{y}}_k ) =1 \), and the first condition holds.
Now, when \(y < {\bar{y}}_k\), we bound the left-hand side of the expression in (I) by conditioning on the event that \(X_{k+1}^{QASE} \) “betters”, that is, the event that \({\bar{Y}}_{k+1}^{QASE} \le {\bar{y}}_k^{high}\), yielding
by Assumption 2 (ii).
We next consider the event \(\{{\bar{y}}_k \le {\bar{y}}_k^{high}\}\), which occurs with probability at least \(1-\alpha \), by (8). We rewrite (35) as,
From Assumption 2 (i), we have
Making use of the lower bound developed in Theorem 1, \( \frac{\nu (S_{{\bar{y}}_k})}{ \nu (S_{{\bar{y}}_k^{high}} ) } \ge q \), we have,
We similarly expand the expression for HAS2 in the right-hand side of (I), noting that HAS2 either improves or stays where it is, yielding,
and since the bettering probability of HAS2 equals \(\gamma \cdot (1-\alpha ) \cdot q\), and when HAS2 “betters”, it samples uniformly on the improving level set, we have,
Combining (37) and (38) proves condition (I).
The second condition in Lemma 2 is satisfied directly by Assumption 2 (iii). The third condition in Lemma 2 is satisfied by Assumption 2 (i). This proves the theorem by Lemma 2. \(\blacksquare \)
Appendix C Details for Sample Problem
The one-dimensional sample problem f(x), illustrated in Fig. 3, is defined for \(x\in [-4, 4]\). The calculations use the following parameter settings: \(\sigma = 1\), \(\alpha =0.05\), \(\gamma =0.5\), and \(\epsilon =0.3\). The values for \({\mathcal {K}}_{q}\) are calculated numerically for a range of values of q, and are shown in Fig. 6.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Zabinsky, Z.B., Linz, D.D. Hesitant adaptive search with estimation and quantile adaptive search for global optimization with noise. J Glob Optim 87, 31–55 (2023). https://doi.org/10.1007/s10898-023-01307-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10898-023-01307-7