Abstract
This work introduces the StoMADS-PB algorithm for constrained stochastic blackbox optimization, which is an extension of the mesh adaptive direct-search (MADS) method originally developed for deterministic blackbox optimization under general constraints. The values of the objective and constraint functions are provided by a noisy blackbox, i.e., they can only be computed with random noise whose distribution is unknown. As in MADS, constraint violations are aggregated into a single constraint violation function. Since all function values are numerically unavailable, StoMADS-PB uses estimates and introduces probabilistic bounds for the violation. Such estimates and bounds obtained from stochastic observations are required to be accurate and reliable with high, but fixed, probabilities. The proposed method, which allows intermediate infeasible solutions, accepts new points using sufficient decrease conditions and imposing a threshold on the probabilistic bounds. Using Clarke nonsmooth calculus and martingale theory, Clarke stationarity convergence results for the objective and the violation function are derived with probability one.
Similar content being viewed by others
Notes
References
Abramson, M.A., Audet, C., Dennis, J.E., Jr., Le Digabel, S.: OrthoMADS: a deterministic MADS instance with orthogonal directions. SIAM J. Optim. 20(2), 948–966 (2009)
Alarie, S., Audet, C., Bouchet, P.-Y., Le Digabel, S.: Optimization of noisy blackboxes with adaptive precision. SIAM J. Optim. 31(4), 3127–3156 (2021)
Anderson, E.J., Ferris, M.C.: A direct search algorithm for optimization with noisy function evaluations. SIAM J. Optim. 11(3), 837–857 (2001)
Angün, E., Kleijnen, J., den Hertog, D., Gürkan, G.: Response surface methodology with stochastic constraints for expensive simulation. J. Oper. Res. Soc. 60(6), 735–746 (2009)
Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pac. J. Math. 16(1), 1–3 (1966)
Audet, C.: A survey on direct search methods for blackbox optimization and their applications. In: Pardalos, P.M., Rassias, T.M. (eds.) chapter 2 Mathematics Without Boundaries: Surveys in Interdisciplinary Research, pp. 31–56. Springer (2014)
Audet, C., Dennis, J.E., Jr.: A pattern search filter method for nonlinear programming without derivatives. SIAM J. Optim. 14(4), 980–1010 (2004)
Audet, C., Dennis, J.E., Jr.: Mesh adaptive direct search algorithms for constrained optimization. SIAM J. Optim. 17(1), 188–217 (2006)
Audet, C., Dennis, J.E., Jr.: A progressive barrier for derivative-free nonlinear programming. SIAM J. Optim. 20(1), 445–472 (2009)
Audet, C., Dennis, J.E., Jr., Le Digabel, S.: Parallel space decomposition of the mesh adaptive direct search algorithm. SIAM J. Optim. 19(3), 1150–1170 (2008)
Audet, C., Dzahini, K.J., Kokkolaras, M., Le Digabel, S.: Stochastic mesh adaptive direct search for blackbox optimization using probabilistic estimates. Comput. Optim. Appl. 79(1), 1–34 (2021)
Audet, C., Hare, W.: Derivative-Free and Blackbox Optimization. Springer Series in Operations Research and Financial Engineering. Springer (2017)
Audet, C., Ihaddadene, A., Le Digabel, S., Tribes, C.: Robust optimization of noisy blackbox problems using the Mesh Adaptive Direct Search algorithm. Optim. Lett. 12(4), 675–689 (2018)
Audet, C., Le Digabel, S., Tribes, C.: The mesh adaptive direct search algorithm for granular and discrete variables. SIAM J. Optim. 29(2), 1164–1189 (2019)
Augustin, F., Marzouk, Y.M.: NOWPAC: A provably convergent derivative-free nonlinear optimizer with path-augmented constraints. Technical report, arXiv (2014)
Augustin, F., Marzouk, Y.M.: A trust-region method for derivative-free nonlinear constrained stochastic optimization. Technical Report 1703.04156, arXiv (2017)
Bandeira, A.S., Scheinberg, K., Vicente, L.N.: Convergence of trust-region methods based on probabilistic models. SIAM J. Optim. 24(3), 1238–1264 (2014)
Barton, R.R., Ivey, J.S., Jr.: Nelder-Mead simplex modifications for simulation optimization. Manag. Sci. 42(7), 954–973 (1996)
Bertsimas, D., Nohadani, O., Teo, K.M.: Nonconvex robust optimization for problems with constraints. Informs J. Comput. 22(1), 44–58 (2010)
Bhattacharya, R.N., Waymire, E.C.: A basic course in probability theory, vol. 69. Springer (2007)
Blanchet, J., Cartis, C., Menickelly, M., Scheinberg, K.: Convergence Rate Analysis of a Stochastic Trust Region Method via Submartingales. Informs J. Optim. 1(2), 92–119 (2019)
Chang, K.H.: Stochastic nelder-mead simplex method—a new globally convergent direct search method for simulation optimization. Eur. J. Oper. Res. 220(3), 684–694 (2012)
Chen, R., Menickelly, M., Scheinberg, K.: Stochastic optimization using a trust-region method and random models. Math. Program. 169(2), 447–487 (2018)
Chen, X., Wang, N.: Optimization of short-time gasoline blending scheduling problem with a DNA based hybrid genetic algorithm. Chem. Eng. Process. 49(10), 1076–1083 (2010)
Clarke, F.H.: Optimization and Nonsmooth Analysis. John Wiley and Sons, New York (1983). Reissued in 1990 by SIAM Publications, Philadelphia, as Vol. 5 in the series Classics in Applied Mathematics
Conn, A.R., Scheinberg, K., Vicente, L.N.: Introduction to Derivative-Free Optimization. MOS-SIAM Series on Optimization. SIAM, Philadelphia (2009)
Curtis, F.E., Scheinberg, K.: Adaptive stochastic optimization: A framework for analyzing stochastic optimization algorithms. IEEE Signal Process. Mag. 37(5), 32–42 (2020)
Curtis, F.E., Scheinberg, K., Shi, R.: A stochastic trust region algorithm based on careful step normalization. Informs J. Optim. 1(3), 200–220 (2019)
Diniz-Ehrhardt, M.A., Ferreira, D.G., Santos, S.A.: A pattern search and implicit filtering algorithm for solving linearly constrained minimization problems with noisy objective functions. Optim. Methods Softw. 34(4), 827–852 (2019)
Diniz-Ehrhardt, M.A., Ferreira, D.G., Santos, S.A.: Applying the pattern search implicit filtering algorithm for solving a noisy problem of parameter identification. Comput. Optim. Appl. 1–32 (2020)
Dolan, E.D., Moré, J.J.: Benchmarking optimization software with performance profiles. Math. Program. 91(2), 201–213 (2002)
Durrett, R.: Probability: Theory and Examples. Cambridge University Press (2010)
Dzahini, K.J.: Expected complexity analysis of stochastic direct-search. Comput. Optim. Appl. 81(1), 179–200 (2022)
Gratton, S., Royer, C.W., Vicente, L.N., Zhang, Z.: Direct search based on probabilistic feasible descent for bound and linearly constrained problems. Comput. Optim. Appl. 72(3), 525–559 (2019)
Hock, W., Schittkowski, K.: Test Examples for Nonlinear Programming Codes. Lecture Notes in Economics and Mathematical Systems, vol. 187. Springer (1981)
Jahn, J.: Introduction to the Theory of Nonlinear Optimization. Springer (1994)
Kitayama, S., Arakawa, M., Yamazaki, K.: Sequential approximate optimization using radial basis function network for engineering optimization. Optim. Eng. 12(4), 535–557 (2011)
Klassen, K.J., Yoogalingam, R.: Improving performance in outpatient appointment services with a simulation optimization approach. Prod. Oper. Manag. 18(4), 447–458 (2009)
Lacksonen, T.: Empirical comparison of search algorithms for discrete event simulation. Comput. Ind. Eng. 40(1–2), 133–148 (2001)
Larson, J., Billups, S.C.: Stochastic derivative-free optimization using a trust region framework. Comput. Optim. Appl. 64(3), 619–645 (2016)
Le Digabel, S., Wild, S.M.: A Taxonomy of Constraints in Simulation-Based Optimization. Technical Report G-2015-57, Les cahiers du GERAD (2015)
Letham, B., Karrer, B., Ottoni, G., Bakshy, E.: Constrained Bayesian optimization with noisy experiments. Bayesian Anal. 14(2), 495–519 (2019)
Lukšan, L., Vlček, J.: Test problems for nonsmooth unconstrained and linearly constrained optimization. Technical Report V-798, ICS AS CR (2000)
Mezura-Montes, E., Coello, C.A.: Useful Infeasible Solutions in Engineering Optimization with Evolutionary Algorithms. In: Proceedings of the 4th Mexican International Conference on Advances in Artificial Intelligence, MICAI’05, pp. 652–662, Springer (2005)
Mockus, J.: Bayesian approach to global optimization: theory and applications, volume 37 of Mathematics and Its Applications. Springer (2012)
Moré, J.J., Wild, S.M.: Benchmarking derivative-free optimization algorithms. SIAM J. Optim. 20(1), 172–191 (2009)
Nelder, J.A., Mead, R.: A simplex method for function minimization. Comput. J. 7(4), 308–313 (1965)
Paquette, C., Scheinberg, K.: A stochastic line search method with expected complexity analysis. SIAM J. Optim. 30(1), 349–376 (2020)
Robbins, H., Monro, S.: A stochastic approximation method. Ann. Math. Stat. 22(3), 400–407 (1951)
Rockafellar, R.T.: Generalized directional derivatives and subgradients of nonconvex functions. Canad. J. Math. 32(2), 257–280 (1980)
Rodríguez, J.F., Renaud, J.E., Watson, L.T.: Trust region augmented lagrangian methods for sequential response surface approximation and optimization. J. Mech. Des. 120(1), 58–66 (1998)
Shashaani, S., Hashemi, F.S., Pasupathy, R.: ASTRO-DF: a class of adaptive sampling trust-region algorithms for derivative-free stochastic optimization. SIAM J. Optim. 28(4), 3145–3176 (2018)
Tao, J., Wang, N.: DNA double helix based hybrid GA for the gasoline blending recipe optimization problem. Chem. Eng. Technol. 31(3), 440–451 (2008)
Wang, Z., Ierapetritou, M.: Constrained optimization of black-box stochastic systems using a novel feasibility enhanced Kriging-based method. Comput. Chem. Eng. 118, 210–223 (2018)
Zhao, J., Wang, N.: A bio-inspired algorithm based on membrane computing and its application to gasoline blending scheduling. Comput. Chem. Eng. 35(2), 272–283 (2011)
Acknowledgements
The authors are grateful to Charles Audet from Polytechnique Montréal for valuable discussions and constructive suggestions. This work is supported by the NSERC CRD RDCPJ 490744-15 grant and by an InnovÉÉ grant, both in collaboration with Hydro-Québec and Rio Tinto, and by a FRQNT fellowship.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
This appendix presents the proofs of a series of results stated in Sect. 4.
1.1 Proof of Theorem 4.2
Proof
This theorem is proved using ideas from [11, 21, 23, 33, 40, 48] and conditioning on the disjoint events \(\left\{ T=+\infty \right\} \) and \(\left\{ T<+\infty \right\} \) that are almost sure due to Assumption 4. The proof considers two different parts. Part 1 considers two separate cases conditioned on the event \(\left\{ T=+\infty \right\} \) (i.e., no \(\varepsilon \)-feasible point is found by Algorithm 2): “good bounds” and “bad bounds”, each of which is separated into whether an iteration is h-Dominating, Improving or Unsuccessful. Part 2 considers three separates cases conditioned on the event \(\left\{ T<+\infty \right\} \): “good estimates and good bounds”, “bad estimates and good bounds” and “bad bounds”, each of which is separated into whether an iteration is f-Dominating, h-Dominating, Improving or Unsuccessful.
In order to show (21), the goal of Part 1 is to show that there exists a constant \(\eta >0\) such that conditioned on the almost sure event \(\{T=+\infty \}\), the following holds for all \(k\in {\mathbb {N}}\):
where \(\Phi _k\) is the random function defined by
Indeed, assume that (49) holds. Since \(\Phi _k>0\) for all \(k\in {\mathbb {N}}\), then summing (49) over \(k\in {\mathbb {N}}\) and taking expectations on both sides lead to
That is, (21) holds. Then, Part 2 aims to show that for the same previous constant \(\eta \), conditioned on the almost sure event \(\{T<+\infty \}\) and making use of the following random function
where \(k\vee T:=\max \{k,T\}\), the following holds for all \(k\in {\mathbb {N}}\):
Indeed, assume that (53) holds. Since \(\Phi _k^T>0\) for all \(k\ge 0\), then summing (53) over \(k\in {\mathbb {N}}\) and taking expectations on both sides, yields
where the last inequality in (54) follows from the inequality \(f(X^k_{\text {feas}})\le \kappa ^f_{\max }\) for all \(k\ge 0\), due to Proposition 3.5, and the fact that T is finite almost surely.
The remainder of the proof is devoted to showing that (49) and (53) hold. The following events are introduced for the sake of clarity in the analysis.
Part 1 (\(\varvec{T=+\infty }\) almost surely). The random function \(\Phi _k\) defined in (50) will be shown to satisfy (49) with \(\eta =\frac{1}{2}\alpha \beta (1-\nu )(1-\tau ^2)\), regardless of the change in the objective function f on the \(\varepsilon \)-infeasible incumbents encountered by Algorithm 2. Moreover, since T is infinite almost surely, then no iteration of Algorithm 2 can be f-Dominating. Two separate cases are distinguished and all that follows is conditioned on the almost sure event \(\{T=+\infty \}\).
Case 1 (Good bounds, \(\varvec{\mathbb {1}_{I_k}=1}\)). No matter the type of iteration which occurs, the random function \(\Phi _k\) will be shown to decrease and the smallest decrease is shown to happen on Unsuccessful iterations, thus yielding
-
(i)
The iteration is h-Dominating (\(\mathbb {1}_{{\mathcal {D}}_h}=1\)). The iteration is h-Dominating and the bounds are good, so a decrease occurs in h according to (6), i.e.,
$$\begin{aligned} \mathbb {1}_{I_k}\mathbb {1}_{{\mathcal {D}}_h}\frac{\nu }{m\varepsilon } (h(X^{k+1}_{\text {inf}})-h(X^k_{\text {inf}}))\le -\mathbb {1}_{I_k}\mathbb {1}_{{\mathcal {D}}_h}\nu (\gamma -2)(\Delta ^k_p)^2{.} \end{aligned}$$(56)The frame size parameter is updated according to \(\Delta ^{k+1}_p=\min \{\tau ^{-1}\Delta ^k_p,\delta _{\max }\}\), which implies that
$$\begin{aligned} \mathbb {1}_{I_k}\mathbb {1}_{{\mathcal {D}}_h}(1-\nu )[(\Delta ^{k+1}_p)^2-(\Delta ^k_p)^2]\le \mathbb {1}_{I_k}\mathbb {1}_{{\mathcal {D}}_h}(1-\nu )(\tau ^{-2}-1)(\Delta ^k_p)^2. \end{aligned}$$(57)Then, by choosing \(\nu \) according to (19), the right-hand side of (56) dominates that of (57). That is,
$$\begin{aligned} -\nu (\gamma -2)(\Delta ^k_p)^2 + (1-\nu )(\tau ^{-2}-1)(\Delta ^k_p)^2 \le -\frac{1}{2}\nu (\gamma -2)(\Delta ^k_p)^2. \end{aligned}$$(58)Combining (56), (57) and (58) leads to
$$\begin{aligned} \mathbb {1}_{I_k}\mathbb {1}_{{\mathcal {D}}_h}(\Phi _{k+1}-\Phi _k)\le -\mathbb {1}_{I_k}\mathbb {1}_{{\mathcal {D}}_h}\frac{1}{2}\nu (\gamma -2)(\Delta ^k_p)^2. \end{aligned}$$(59) -
(ii)
The iteration is Improving (\(\mathbb {1}_{{\mathcal {I}}}=1\)). The iteration is Improving and the bounds are good, so again, a decrease occurs in h according to (6). Moreover, \(\Delta ^k_p\) is updated as in h-Dominating iterations. Thus, the change in \(\Phi _k\) follows from (59) by replacing \(\mathbb {1}_{{\mathcal {D}}_h}\) by \(\mathbb {1}_{{\mathcal {I}}}\). Specifically,
$$\begin{aligned} \mathbb {1}_{I_k}\mathbb {1}_{{\mathcal {I}}}(\Phi _{k+1}-\Phi _k)\le -\mathbb {1}_{I_k}\mathbb {1}_{{\mathcal {I}}}\frac{1}{2}\nu (\gamma -2)(\Delta ^k_p)^2. \end{aligned}$$(60) -
(iii)
The iteration is Unsuccessful (\(\mathbb {1}_{{\mathcal {U}}}=1\)). The value of h is unchanged while the frame size parameter is decreased. Consequently,
$$\begin{aligned} \mathbb {1}_{I_k}\mathbb {1}_{{\mathcal {U}}}(\Phi _{k+1}-\Phi _k)=-\mathbb {1}_{I_k}\mathbb {1}_{{\mathcal {U}}}(1-\nu )(1-\tau ^{2})(\Delta ^k_p)^2 \end{aligned}$$(61)Because \(\nu \) satisfies (19) and because \(1-\tau ^2<\tau ^{-2}-1\), Unsuccessful iterations, vis a vis (61), provide the worst case decrease when compared to (59) and (60). That is,
$$\begin{aligned} -\frac{1}{2}\nu (\gamma -2)(\Delta ^k_p)^2\le -(1-\nu )(1-\tau ^{2})(\Delta ^k_p)^2. \end{aligned}$$(62)Thus, it follows from (59), (60), (61) and (62) that the change in \(\Phi _k\) is bounded like
$$\begin{aligned} \mathbb {1}_{I_k}(\Phi _{k+1}-\Phi _k)= & {} \mathbb {1}_{I_k}(\mathbb {1}_{{\mathcal {D}}_h}+\mathbb {1}_{{\mathcal {I}}}+\mathbb {1}_{{\mathcal {U}}})(\Phi _{k+1}-\Phi _k)\nonumber \\\le & {} -\mathbb {1}_{I_k}(1-\nu )(1-\tau ^{2})(\Delta ^k_p)^2. \end{aligned}$$(63)
Since Assumption 3 holds, taking conditional expectations with respect to \({\mathcal {F}}^{C\cdot F}_{k-1}\) on both sides of the inequality in (63) leads to (55).
Case 2 (Bad bounds, \(\varvec{\mathbb {1}_{\bar{I_k}}=1}\)). Since the bounds are bad, Algorithm 2 can accept a step which leads to an increase in h and \(\Delta ^k_p\), and hence in \(\Phi _k\). Such an increase in \(\Phi _k\) is controlled by making use of (15). Then, the probability of \(\bar{I_k}\) is chosen to be sufficiently small so that \(\Phi _k\) can be reduced sufficiently in expectation. More precisely, the next result will be proved
-
(i)
The iteration is h-Dominating (\(\mathbb {1}_{{\mathcal {D}}_h}=1\)). The change in h is bounded like
$$\begin{aligned}&\mathbb {1}_{\bar{I_k}}\mathbb {1}_{{\mathcal {D}}_h}\frac{\nu }{m\varepsilon } (h(X^{k+1}_{\text {inf}})-h(X^k_{\text {inf}})) \nonumber \\&\quad \le \mathbb {1}_{\bar{I_k}}\mathbb {1}_{{\mathcal {D}}_h}\frac{\nu }{m\varepsilon } \left[ (H^k_s-H^k_0)+\left|h(X^{k+1}_{\text {inf}}) -H^k_s\right|+\left|h(X^k_{\text {inf}})-H^k_0\right|\right] \nonumber \\&\quad \le \mathbb {1}_{\bar{I_k}}\mathbb {1}_{{\mathcal {D}}_h}\nu \left[ -\gamma (\Delta ^k_p)^2+\frac{1}{m\varepsilon }\left( \left|h(X^{k+1}_{\text {inf}}) -H^k_s\right|+\left|h(X^k_{\text {inf}})-H^k_0\right|\right) \right] , \end{aligned}$$(65)where (65) follows from \(H^k_s-H^k_0\le -\gamma m\varepsilon (\Delta ^k_p)^2 \) which is satisfied in every h-Dominating iteration. Moreover, the change in \(\Delta ^k_p\) can be obtained simply by replacing \(\mathbb {1}_{I_k}\) by \(\mathbb {1}_{\bar{I_k}}\) in (57). That is,
$$\begin{aligned} \mathbb {1}_{\bar{I_k}}\mathbb {1}_{{\mathcal {D}}_h}(1-\nu )[(\Delta ^{k+1}_p)^2-(\Delta ^k_p)^2]\le \mathbb {1}_{\bar{I_k}}\mathbb {1}_{{\mathcal {D}}_h}(1-\nu )(\tau ^{-2}-1)(\Delta ^k_p)^2. \end{aligned}$$(66)Because \(\nu \) satisfies (19), \(-\nu \gamma (\Delta ^k_p)^2+(1-\nu )(\tau ^{-2}-1)(\Delta ^k_p)^2\le 0\). Combining (65) and (66),
$$\begin{aligned} \mathbb {1}_{\bar{I_k}}\mathbb {1}_{{\mathcal {D}}_h}(\Phi _{k+1}-\Phi _k)\le \mathbb {1}_{\bar{I_k}}\mathbb {1}_{{\mathcal {D}}_h}\frac{\nu }{m\varepsilon } \left( \left|h(X^{k+1}_{\text {inf}})-H^k_s\right|+\left|h(X^k_{\text {inf}})-H^k_0\right|\right) .\nonumber \\ \end{aligned}$$(67) -
(ii)
The iteration is Improving (\(\mathbb {1}_{{\mathcal {I}}}=1\)). \(\Delta ^k_p\) is updated as in h-Dominating iterations. The increase in h is bounded as in (65). Thus, the bound on the change in \(\Phi _k\) can be obtained by replacing \(\mathbb {1}_{{\mathcal {D}}_h}\) by \(\mathbb {1}_{{\mathcal {I}}}\) in (67). That is,
$$\begin{aligned} \mathbb {1}_{\bar{I_k}}\mathbb {1}_{{\mathcal {I}}}(\Phi _{k+1}-\Phi _k)\le \mathbb {1}_{\bar{I_k}}\mathbb {1}_{{\mathcal {I}}}\frac{\nu }{m\varepsilon } \left( \left|h(X^{k+1}_{\text {inf}})-H^k_s\right|+\left|h(X^k_{\text {inf}})-H^k_0\right|\right) .\nonumber \\ \end{aligned}$$(68) -
(iii)
The iteration is Unsuccessful (\(\mathbb {1}_{{\mathcal {U}}}=1\)). The value of h is unchanged and \(\Delta ^k_p\) is decreased. Thus, the change in \(\Phi _k\) follows from (61) by replacing \(\mathbb {1}_{I_k}\) by \(\mathbb {1}_{\bar{I_k}}\) and is trivially bounded like
$$\begin{aligned} \mathbb {1}_{\bar{I_k}}\mathbb {1}_{{\mathcal {U}}}(\Phi _{k+1}-\Phi _k) \le \mathbb {1}_{\bar{I_k}}\mathbb {1}_{{\mathcal {U}}}\frac{\nu }{m\varepsilon } \left( \left|h(X^{k+1}_{\text {inf}})-H^k_s\right|+\left|h(X^k_{\text {inf}})-H^k_0\right|\right) .\nonumber \\ \end{aligned}$$(69)It follows from (67), (68), (69) and the inequality \(\mathbb {1}_{\bar{I_k}}\le 1\), that
$$\begin{aligned} \mathbb {1}_{\bar{I_k}}(\Phi _{k+1}-\Phi _k) \le \frac{\nu }{m\varepsilon }\left( \left|h(X^{k+1}_{\text {inf}})-H^k_s\right|+\left|h(X^k_{\text {inf}})-H^k_0\right|\right) {.} \end{aligned}$$(70)Taking conditional expectations with respect to \({\mathcal {F}}^{C\cdot F}_{k-1}\) on both sides of (70) and using the inequalities (15) of Lemma 3.7, leads to (64).
Combining (55) and (64) yields,
Choosing \(\alpha \) according to (20) implies that \(\displaystyle {\alpha \ge \frac{4\nu (1-\alpha )^{1/2}}{(1-\nu )(1-\tau ^2)}}\), which ensures
Thus, (49) follows from (71) and (72) with \(\eta =\frac{1}{2}\alpha \beta (1-\nu )(1-\tau ^2)\).
Part 2 (\(\varvec{T<+\infty }\) almost surely). In order to show that the random function \(\Phi _k^T\) defined by
satisfies (53) with the same constant \(\eta \) derived in Part 1, notice that whenever the event \(\{T>k\}\) occurs, then \(f(X^{(k+1)\vee T}_{\text {feas}})-f(X^{k\vee T}_{\text {feas}})=0\) since \(\max \{k,T\}:=k\vee T=(k+1)\vee T=T\). Thus, on the event \(\{T>k\}\), the random function \(\Phi _k\) used in Part 1 has the same increment as \(\Phi _k^T\). Specifically,
Moreover, it follows from the definition of the stopping time T that no iteration can be f-Dominating when the event \(\{T>k\}\) occurs. Consequently, it easily follows from the analysis in Part 1 and the fact that the random variable \(\mathbb {1}_{\{T> k\}}\) is \({\mathcal {F}}^{C\cdot F}_{k-1}\)-measurable that,
The remainder of the proof is devoted to showing that
since combining (73) and (74) leads to (53), which is the overall goal. In all that follows, it is assumed that the event \(\{T\le k\}\) occurs.
Case 1 (Good estimates and good bounds, \(\varvec{\mathbb {1}_{I_k}\mathbb {1}_{J_k}=1}\)). Regardless of the iteration type, the smallest decrease in \(\Phi _k^T\) will be shown to happen on Unsuccessful iterations, and it will be shown that
-
(i)
The iteration is f-Dominating (\(\mathbb {1}_{{\mathcal {D}}_f}=1\)). The iteration is f-Dominating and the estimates are good, so a decrease occurs in f according to (10). That is,
$$\begin{aligned}&\mathbb {1}_{\{T\le k\}}\mathbb {1}_{I_k}\mathbb {1}_{J_k}\mathbb {1}_{{\mathcal {D}}_f}\frac{\nu }{\varepsilon }(f(X^{(k+1)\vee T}_{\text {feas}})-f(X^{k\vee T}_{\text {feas}}))\nonumber \\&\quad \le -\mathbb {1}_{\{T\le k\}}\mathbb {1}_{I_k}\mathbb {1}_{J_k}\mathbb {1}_{{\mathcal {D}}_f}\nu (\gamma -2)(\Delta ^k_p)^2. \end{aligned}$$(76)Since the \(\varepsilon \)-infeasible incumbent is not updated, The value of h is unchanged. The frame size parameter is updated according to \(\Delta ^{k+1}_p=\min \{\tau ^{-1}\Delta ^k_p,\delta _{\max }\}\), thus implying that
$$\begin{aligned}&\mathbb {1}_{\{T\le k\}}\mathbb {1}_{I_k}\mathbb {1}_{J_k}\mathbb {1}_{{\mathcal {D}}_f}(1-\nu )[(\Delta ^{k+1}_p)^2-(\Delta ^k_p)^2]\nonumber \\&\quad \le \mathbb {1}_{\{T\le k\}}\mathbb {1}_{I_k}\mathbb {1}_{J_k}\mathbb {1}_{{\mathcal {D}}_f}(1-\nu )(\tau ^{-2}-1)(\Delta ^k_p)^2. \end{aligned}$$(77)Because \(\nu \) satisfies (19), (58) holds, which implies that the right-hand side of (76) dominates that of (77), leading to the inequality
$$\begin{aligned} \mathbb {1}_{\{T\le k\}}\mathbb {1}_{I_k}\mathbb {1}_{J_k}\mathbb {1}_{{\mathcal {D}}_f}(\Phi _{k+1}^T-\Phi _k^T) \le -\mathbb {1}_{\{T\le k\}}\mathbb {1}_{I_k}\mathbb {1}_{J_k}\mathbb {1}_{{\mathcal {D}}_f}\frac{1}{2} \nu (\gamma -2)(\Delta ^k_p)^2.\nonumber \\ \end{aligned}$$(78) -
(ii)
The iteration is h-Dominating (\(\mathbb {1}_{{\mathcal {D}}_h}=1\)). The value of f is unchanged since \(X^k_{\text {feas}}\) is not updated. Thus, the bound on the change in \(\Phi _k^T\) follows from multiplying both sides of (59) by \(\mathbb {1}_{\{T\le k\}}\mathbb {1}_{J_k}\), and replacing \(\Phi _k\) by \(\Phi _k^T\). That is,
$$\begin{aligned}&\mathbb {1}_{\{T\le k\}}\mathbb {1}_{I_k}\mathbb {1}_{J_k}\mathbb {1}_{{\mathcal {D}}_h}(\Phi _{k+1}^T-\Phi _k^T) \le -\mathbb {1}_{\{T\le k\}}\mathbb {1}_{I_k}\mathbb {1}_{J_k}\mathbb {1}_{{\mathcal {D}}_h}\frac{1}{2}\nu (\gamma -2)(\Delta ^k_p)^2.\nonumber \\ \end{aligned}$$(79) -
(iii)
The iteration is Improving (\(\mathbb {1}_{{\mathcal {I}}}=1\)). Again, the value of f is unchanged. Thus, the bound on the change in \(\Phi _k^T\) follows from multiplying both sides of (60) by \(\mathbb {1}_{\{T\le k\}}\mathbb {1}_{J_k}\), and replacing \(\Phi _k\) by \(\Phi _k^T\). That is,
$$\begin{aligned}&\mathbb {1}_{\{T\le k\}}\mathbb {1}_{I_k}\mathbb {1}_{J_k}\mathbb {1}_{{\mathcal {I}}}(\Phi _{k+1}^T-\Phi _k^T) \le -\mathbb {1}_{\{T\le k\}}\mathbb {1}_{I_k}\mathbb {1}_{J_k}\mathbb {1}_{{\mathcal {I}}}\frac{1}{2}\nu (\gamma -2)(\Delta ^k_p)^2.\nonumber \\ \end{aligned}$$(80) -
(iv)
The iteration is Unsuccessful (\(\mathbb {1}_{{\mathcal {U}}}=1\)). The value of f and h is unchanged since no incumbent is updated, while \(\Delta ^k_p\) is decreased. Consequently, the bound on the change in \(\Phi _k^T\) follows from multiplying both sides of (61) by \(\mathbb {1}_{\{T\le k\}}\mathbb {1}_{J_k}\), and replacing \(\Phi _k\) by \(\Phi _k^T\). That is,
$$\begin{aligned} \mathbb {1}_{\{T\le k\}}\mathbb {1}_{I_k}\mathbb {1}_{J_k}\mathbb {1}_{{\mathcal {U}}}(\Phi _{k+1}^T-\Phi _k^T)=-\mathbb {1}_{\{T\le k\}}\mathbb {1}_{I_k}\mathbb {1}_{J_k}\mathbb {1}_{{\mathcal {U}}}(1-\nu )(1-\tau ^{2})(\Delta ^k_p)^2.\nonumber \\ \end{aligned}$$(81)Combining (78), (79), (80), (81) and (62) yields
$$\begin{aligned} \mathbb {1}_{\{T\le k\}}\mathbb {1}_{I_k}\mathbb {1}_{J_k}(\Phi _{k+1}^T-\Phi _k^T) \le -\mathbb {1}_{\{T\le k\}}\mathbb {1}_{I_k}\mathbb {1}_{J_k}(1-\nu )(1-\tau ^2)(\Delta ^k_p)^2. \end{aligned}$$(82)
The following holds under Assumption 3: \({\mathbb {E}}\left( \mathbb {1}_{I_k}\mathbb {1}_{J_k}|{\mathcal {F}}^{C\cdot F}_{k-1}\right) \ge \alpha \beta \). Then, taking expectations with respect to \({\mathcal {F}}^{C\cdot F}_{k-1}\) on both sides of (82) and using the \({\mathcal {F}}^{C\cdot F}_{k-1}\)-measurability of the random variables \(\mathbb {1}_{\{T\le k\}}\) and \(\Delta ^k_p\) leads to (75).
Case 2 (Bad estimates and good bounds, \(\varvec{\mathbb {1}_{I_k}\mathbb {1}_{\bar{J_k}}=1}\)). An increase in the difference of \(\Phi _k^T\) may occur since good bounds might not provide enough decrease to cancel the increase which occurs in f whenever Algorithm 2 wrongly accepts an incumbent due to bad estimates. Specifically, the f-Dominating case dominates the worst-case increase in the change of \(\Phi _k^T\), leading to
-
(i)
The iteration is f-Dominating (\(\mathbb {1}_{{\mathcal {D}}_f}=1\)). Whenever bad estimates occur and the iteration is f-Dominating, the change in f is bounded like
$$\begin{aligned} \begin{aligned}&\mathbb {1}_{\{T\le k\}} \mathbb {1}_{I_k}\mathbb {1}_{\bar{J_k}}\mathbb {1}_{{\mathcal {D}}_f}\frac{\nu }{\varepsilon }(f(X^{(k+1)\vee T}_{\text {feas}})-f(X^{k\vee T}_{\text {feas}}))\\&\quad \le \mathbb {1}_{\{T\le k\}}\mathbb {1}_{I_k}\mathbb {1}_{\bar{J_k}}\mathbb {1}_{{\mathcal {D}}_f}\frac{\nu }{\varepsilon } \left[ (F^k_s-F^k_0) +\left|f(X^{k+1}_{\text {feas}})-F^k_s\right|+\left|f(X^k_{\text {feas}})-F^k_0\right|\right] \\&\quad \le \mathbb {1}_{\{T\le k\}}\mathbb {1}_{I_k}\mathbb {1}_{\bar{J_k}}\mathbb {1}_{{\mathcal {D}}_f}\nu \left[ -\gamma (\Delta ^k_p)^2 +\frac{1}{\varepsilon }\left( \left|f(X^{k+1}_{\text {feas}})-F^k_s\right|+\left|f(X^k_{\text {feas}})-F^k_0\right|\right) \right] \end{aligned}\nonumber \\ \end{aligned}$$(84)where the last inequality in (84) follows from \(F^k_s-F^k_0\le -\gamma \varepsilon (\Delta ^k_p)^2 \) which is satisfied for every f-Dominating iteration. While the value of h remains unchanged since \(X^k_{\text {inf}}\) is not updated, the change in \(\Delta ^k_p\) follows (77) by replacing \(\mathbb {1}_{J_k}\) by \(\mathbb {1}_{\bar{J_k}}\). That is,
$$\begin{aligned}&\mathbb {1}_{\{T\le k\}}\mathbb {1}_{I_k}\mathbb {1}_{\bar{J_k}}\mathbb {1}_{{\mathcal {D}}_f}(1-\nu )[(\Delta ^{k+1}_p)^2-(\Delta ^k_p)^2] \le \mathbb {1}_{\{T\le k\}}\mathbb {1}_{I_k}\mathbb {1}_{\bar{J_k}}\mathbb {1}_{{\mathcal {D}}_f}(1-\nu )(\tau ^{-2}-1)(\Delta ^k_p)^2. \nonumber \\ \end{aligned}$$(85)Then, (84), (85), (19) and the inequality \(-\nu \gamma (\Delta ^k_p)^2+(1-\nu )(\tau ^{-2}-1)(\Delta ^k_p)^2\le 0\) yield
$$\begin{aligned} \begin{aligned}&\mathbb {1}_{\{T\le k\}}\mathbb {1}_{I_k}\mathbb {1}_{\bar{J_k}}\mathbb {1}_{{\mathcal {D}}_f}( \Phi _{k+1}^T-\Phi _k^T)\\&\quad \le \mathbb {1}_{\{T\le k\}}\mathbb {1}_{I_k}\mathbb {1}_{\bar{J_k}}\mathbb {1}_{{\mathcal {D}}_f}\frac{\nu }{\varepsilon }\left( \left|f(X^{k+1}_{\text {feas}}) -F^k_s\right|+\left|f(X^k_{\text {feas}})-F^k_0\right|\right) . \end{aligned} \end{aligned}$$(86) -
(ii)
The iteration is h-Dominating (\(\mathbb {1}_{{\mathcal {D}}_h}=1\)). The bound on the change in \(\Phi _k^T\), which can be obtained by replacing \(\mathbb {1}_{J_k}\) by \(\mathbb {1}_{\bar{J_k}}\) in (79), is trivially bounded like
$$\begin{aligned}&\mathbb {1}_{\{T\le k\}}\mathbb {1}_{I_k}\mathbb {1}_{\bar{J_k}}\mathbb {1}_{{\mathcal {D}}_h}\quad (\Phi _{k+1}^T-\Phi _k^T)\nonumber \\&\quad \le \mathbb {1}_{\{T\le k\}}\mathbb {1}_{I_k}\mathbb {1}_{\bar{J_k}}\mathbb {1}_{{\mathcal {D}}_h}\frac{\nu }{\varepsilon }\left( \left|f(X^{k+1}_{\text {feas}}) -F^k_s\right|+\left|f(X^k_{\text {feas}})-F^k_0\right|\right) . \end{aligned}$$(87) -
(iii)
The iteration is Improving (\(\mathbb {1}_{{\mathcal {I}}}=1\)). Again, the change in \(\Phi _k^T\) which can be obtained by replacing \(\mathbb {1}_{J_k}\) by \(\mathbb {1}_{\bar{J_k}}\) in (80), is trivially bounded like
$$\begin{aligned}&\mathbb {1}_{\{T\le k\}}\mathbb {1}_{I_k}\mathbb {1}_{\bar{J_k}}\mathbb {1}_{{\mathcal {I}}}\quad (\Phi _{k+1}^T-\Phi _k^T)\nonumber \\&\quad \le \mathbb {1}_{\{T\le k\}}\mathbb {1}_{I_k}\mathbb {1}_{\bar{J_k}}\mathbb {1}_{{\mathcal {I}}}\frac{\nu }{\varepsilon }\left( \left|f(X^{k+1}_{\text {feas}}) -F^k_s\right|+\left|f(X^k_{\text {feas}})-F^k_0\right|\right) . \end{aligned}$$(88) -
(iv)
The iteration is Unsuccessful (\(\mathbb {1}_{{\mathcal {U}}}=1\)). Because of the decrease of the frame size parameter and hence the decrease in \(\Phi _k^T\), the bound on the change in \(\Phi _k^T\) follows
$$\begin{aligned} \begin{aligned}&\mathbb {1}_{\{T\le k\}}\mathbb {1}_{I_k}\mathbb {1}_{\bar{J_k}}\mathbb {1}_{{\mathcal {U}}}( \Phi _{k+1}^T-\Phi _k^T)\\&\quad \le \mathbb {1}_{\{T\le k\}}\mathbb {1}_{I_k}\mathbb {1}_{\bar{J_k}}\mathbb {1}_{{\mathcal {U}}}\frac{\nu }{\varepsilon }\left( \left|f(X^{k+1}_{\text {feas}}) -F^k_s\right|+\left|f(X^k_{\text {feas}})-F^k_0\right|\right) . \end{aligned} \end{aligned}$$(89)Then, combining (86), (87), (88) and \(\mathbb {1}_{I_k}\mathbb {1}_{\bar{J_k}}\le 1\), yields
$$\begin{aligned}&\mathbb {1}_{\{T\le k\}}\mathbb {1}_{I_k}\mathbb {1}_{\bar{J_k}}(\Phi _{k+1}^T-\Phi _k^T)\nonumber \\&\quad \le \mathbb {1}_{\{T\le k\}}\frac{\nu }{\varepsilon }\left( \left|f(X^{k+1}_{\text {feas}})-F^k_s\right|+\left|f(X^k_{\text {feas}})-F^k_0\right|\right) . \end{aligned}$$(90)Since Assumption 3 holds, it follows from the conditional Cauchy-Schwarz inequality [20] that
$$\begin{aligned} {\mathbb {E}}\left( \left|f(X^k_{\text {feas}})-F^k_0\right||{\mathcal {F}}^{C\cdot F}_{k-1}\right)\le & {} {\mathbb {E}}\left( 1|{\mathcal {F}}^{C\cdot F}_{k-1}\right) ^{1/2} \left[ {\mathbb {E}}\left( \left|f(X^k_{\text {feas}}) -F^k_0\right|^2|{\mathcal {F}}^{C\cdot F}_{k-1}\right) \right] ^{1/2}\nonumber \\\le & {} \varepsilon (1-\beta )^{1/2}(\Delta ^k_p)^2, \end{aligned}$$(91)where (91) follows from (12) and the fact that \({\mathbb {E}}\left( 1|{\mathcal {F}}^{C\cdot F}_{k-1}\right) =1\). Similarly,
$$\begin{aligned} {\mathbb {E}}\left( \left|f(X^{k+1}_{\text {feas}})-F^k_s\right||{\mathcal {F}}^{C\cdot F}_{k-1}\right) \le \varepsilon (1-\beta )^{1/2}(\Delta ^k_p)^2. \end{aligned}$$(92)
Taking expectations with respect to \({\mathcal {F}}^{C\cdot F}_{k-1}\) on both sides of (90) and then using (91), (92) and the \({\mathcal {F}}^{C\cdot F}_{k-1}\)-measurability of the random variables \(\mathbb {1}_{\{T\le k\}}\) and \(\Delta ^k_p\), leads to (83).
Case 3 (Bad bounds, \(\varvec{\mathbb {1}_{\bar{I_k}}=1}\)). The difference in \(\Phi _k^T\) may increase since even though good estimates of f values occur, they might not provide enough decrease to cancel the increase in h whenever Algorithm 2 wrongly accepts an incumbent due to bad bounds. It will be shown that
-
(i)
The iteration is f-Dominating (\(\mathbb {1}_{{\mathcal {D}}_f}=1\)). The change in \(\Phi _k^T\) is bounded, taking into account the possible increase in f. Since the value of h is unchanged, the bound on the change in \(\Phi _k^T\) can be derived from (86) by replacing \(\mathbb {1}_{I_k}\mathbb {1}_{\bar{J_k}}\) by \(\mathbb {1}_{\bar{I_k}}\). That is,
$$\begin{aligned} \begin{aligned}&\mathbb {1}_{\{T\le k\}}\mathbb {1}_{\bar{I_k}}\mathbb {1}_{{\mathcal {D}}_f}( \Phi _{k+1}^T-\Phi _k^T)\\&\quad \le \mathbb {1}_{\{T\le k\}}\mathbb {1}_{\bar{I_k}}\mathbb {1}_{{\mathcal {D}}_f}\frac{\nu }{\varepsilon }\left( \left|f(X^{k+1}_{\text {feas}})-F^k_s\right|+\left|f(X^k_{\text {feas}})-F^k_0\right|\right) . \end{aligned} \end{aligned}$$(94) -
(ii)
The iteration is h-Dominating (\(\mathbb {1}_{{\mathcal {D}}_h}=1\)). Since the value of f is unchanged, the bound on the change in \(\Phi _k^T\) is obtained by multiplying both sides of (67) by \(\mathbb {1}_{\{T\le k\}}\) and replacing \(\Phi _k\) by \(\Phi _k^T\). That is,
$$\begin{aligned} \mathbb {1}_{\{T\le k\}}\mathbb {1}_{\bar{I_k}}\mathbb {1}_{{\mathcal {D}}_h}(\Phi _{k+1}-\Phi _k)\le \mathbb {1}_{\{T\le k\}}\mathbb {1}_{\bar{I_k}}\mathbb {1}_{{\mathcal {D}}_h}\frac{\nu }{m\varepsilon }\left( \left|h(X^{k+1}_{\text {inf}})-H^k_s\right|+\left|h(X^k_{\text {inf}})-H^k_0\right|\right) .\nonumber \\ \end{aligned}$$(95) -
(iii)
The iteration is Improving (\(\mathbb {1}_{{\mathcal {I}}}=1\)). The frame size parameter is updated as in h-Dominating iterations and the value of f is unchanged. Thus, the bound on the change in \(\Phi _k^T\) follows from (95) by replacing \(\mathbb {1}_{{\mathcal {D}}_h}\) by \(\mathbb {1}_{{\mathcal {I}}}\). That is, follows
$$\begin{aligned} \mathbb {1}_{\{T\le k\}}\mathbb {1}_{\bar{I_k}}\mathbb {1}_{{\mathcal {I}}}(\Phi _{k+1}-\Phi _k)\le \mathbb {1}_{\{T\le k\}}\mathbb {1}_{\bar{I_k}}\mathbb {1}_{{\mathcal {I}}}\frac{\nu }{m\varepsilon }\left( \left|h(X^{k+1}_{\text {inf}})-H^k_s\right|+\left|h(X^k_{\text {inf}})-H^k_0\right|\right) .\nonumber \\ \end{aligned}$$(96) -
(iv)
The iteration is Unsuccessful (\(\mathbb {1}_{{\mathcal {U}}}=1\)). Because of the decrease of the frame size parameter and hence the decrease in \(\Phi _k^T\), the bound on the change in \(\Phi _k^T\) is
$$\begin{aligned} \begin{aligned}&\mathbb {1}_{\{T\le k\}}\mathbb {1}_{\bar{I_k}}\mathbb {1}_{{\mathcal {U}}}( \Phi _{k+1}^T-\Phi _k^T)\\&\quad \le \mathbb {1}_{\{T\le k\}}\mathbb {1}_{\bar{I_k}}\mathbb {1}_{{\mathcal {U}}}\nu \left[ \frac{1}{\varepsilon }\left( \left|f(X^{k+1}_{\text {feas}})-F^k_s\right|+\left|f(X^k_{\text {feas}})-F^k_0\right|\right) \right. \\&\qquad \left. +\frac{1}{m\varepsilon }\left( \left|h(X^{k+1}_{\text {inf}})-H^k_s\right|+\left|h(X^k_{\text {inf}})-H^k_0\right|\right) \right] \end{aligned} \end{aligned}$$(97)
Since (97) dominates (94), (95) and (96), combining all four cases leads to
Taking expectations with respect to \({\mathcal {F}}^{C\cdot F}_{k-1}\) on both sides of (98) and using (15), (91) and (92) lead to (93). Combining the main results of Case 1, Case 2 and Case 3 of Part 2, specifically (75), (83) and (93),
Choosing \(\alpha \) and \(\beta \) according to (20) ensures that
and (74) follows from (99) and (100) with the same constant \(\eta =\frac{1}{2}\alpha \beta (1-\nu )(1-\tau ^2)\) as Part 1, which completes the proof. \(\square \)
1.2 Proof of Corollary 4.4
Proof
Only (22) is proved but the proof also applies for \(\left|H^k_s-h(X^k+S^k)\right|\) and \(\left|F^k_s-f(X^k+S^k)\right|\). According to Assumption 3(vi), \({\mathbb {E}}\left( \left|H^k_0-h(X^k)\right| |\ {\mathcal {F}}^{C\cdot F}_{k-1}\right) \le m\varepsilon (1-\alpha )^{1/2}(\Delta ^k_p)^2\), which implies that
By summing each side of (101) over k from 0 to N, and observing that
it follows from the monotone convergence theorem (see e.g. Theorem 1.6.6 in [32]) that
where \(\mu \) is from (54). This means that \(\displaystyle {\sum _{k=0}^{+\infty }\left|H^k_0-h(X^k)\right|<+\infty }\) almost surely, which implies the first result of (22). The proof for \(\left|F^k_0-f(X^k)\right|\) is similar by observing that (see (91))
\(\square \)
1.3 Proof of Lemma 4.7
Proof
The proof uses ideas from [11, 23]. The result is proved by contradiction conditioned on the almost sure event \(E_1=\{\Delta ^k_p\rightarrow 0\}\). All that follows is conditioned on the event \(E_1\). Assume that with nonzero probability, there exists a random variable \({\mathcal {E}}'>0\) such that
that is,
Let \(\{x^k_{\text {inf}}\}_{k\in {\mathbb {N}}}\), \(\{s^k\}_{k\in {\mathbb {N}}}\), \(\{\delta ^k_p\}_{k\in {\mathbb {N}}}\) and \(\epsilon '>0\) be realizations of \(\{X^k_{\text {inf}}\}_{k\in {\mathbb {N}}}\), \(\{S^k\}_{k\in {\mathbb {N}}}\), \(\{\Delta ^k_p\}_{k\in {\mathbb {N}}}\) and \({\mathcal {E}}'\), respectively for which \(\psi _k^h\ge \epsilon '\), for all \(k\in {\mathbb {N}}\). Let \({\hat{z}}\) be the parameter of Algorithm 2 satisfying \(\delta ^k_p\le \tau ^{-{\hat{z}}}\) for all \(k\ge 0\). Since \(\delta ^k_p\rightarrow 0\) due to the conditioning on \(E_1\), there exists \(k_0\in {\mathbb {N}}\) such that
Consequently and since \(\tau <1\), the random variable \(R_k\) with realizations \(r_k:=-\log _{\tau }\left( \frac{\delta ^k_p}{\lambda }\right) \) satisfies \(r_k<0\) for all \(k\ge k_0\). The main idea of the proof is to show that such realizations occur only with probability zero, thus leading to a contradiction. First \(\{R_k\}_{k\in {\mathbb {N}}}\) is shown to be a submartingale. Let \(k\ge k_0\) be an iteration for which the events \(I_k\) and \(J_k\) both occur, which happens with probability at least \(\alpha \beta >1/2\). Then, it follows from the definition of the event \(I_k\) (see Definition 3.3) that
where the first inequality in (107) follows from (102), (105) and (106) while the last inequality follows from (104). Consequently, iteration k of Algorithm 2 cannot be Unsuccessful. Thus, the frame size parameter is updated according to \(\delta _p^{k+1}=\tau ^{-1}\delta ^k_p\) since \(\delta ^k_p<\tau ^{1-{\hat{z}}}\). Hence, \(r_{k+1}=r_k+1\).
Let \({\mathcal {F}}^{I\cdot J}_{k-1}=\sigma (I_0,I_1,\dots ,I_{k-1})\cap \sigma (J_0,J_1,\dots ,J_{k-1})\). For all other outcomes of \(I_k\) and \(J_k\), which will occur with a total probability of at most \(1-\alpha \beta \), the inequality \(\delta _p^{k+1}\ge \tau \delta ^k_p\) always holds, thus implying that \(r_{k+1}\ge r_k-1\). Hence,
Thus, \({\mathbb {E}}\left( R_{k+1}-R_k|{\mathcal {F}}^{I\cdot J}_{k-1}\right) \ge 2\alpha \beta -1>0\), implying that \(\{R_k\}\) is a submartingale. The remainder of the proof is almost identical to that of the proof of the \(\liminf \)-type first-order result in [23].
Next is constructed a random walk \(W_k\) with realizations \(w_k\) on the same probability space as \(R_k\), which will serve as a lower bound on \(R_k\). Define \(W_k\) as in (14) by
where the indicator random variables \(\mathbb {1}_{I_i}\) and \(\mathbb {1}_{J_i}\) are such that \(\mathbb {1}_{I_i}=1\) if \(I_i\) occurs, \(\mathbb {1}_{I_i}=0\) otherwise, and similarly, \(\mathbb {1}_{J_i}=1\) if \(J_i\) occurs while \(\mathbb {1}_{J_i}=0\) otherwise. Then following the proof of Theorem 3.6, observe that \(\{W_k\}\) is a \({\mathcal {F}}^{I\cdot J}_{k-1}\)-submartingale with bounded (nonzero) increments (and, as such, cannot converge to any finite value; see also [23] for the same result), thus leading to the conclusion that the event \(\left\{ \underset{k\rightarrow +\infty }{\limsup }\ W_k=+\infty \right\} \) occurs almost surely. Since by construction
then with probability one, \(R_k\) is positive infinitely often. Thus, the sequence of realizations \(r_k\) such that \(r_k<0\) for all \(k\ge k_0\) occurs with probability zero. Thus, the assumption that (103) holds is false. This implies that
which means that (23) holds. \(\square \)
1.4 Proof of Theorem 4.10
Proof
The theorem is proved using ideas from [9, 11]. Define the events \(E_1\) and \(E_2\) by
Then \(E_1\) and \(E_2\) are almost sure due to Corollary 4.3 and (23) respectively. Let \(\omega \in E_1\cap E_2\) be an arbitrary outcome and note that the event \(E_1\cap E_2\) is also almost sure as a countable intersection of almost sure events. Then \(\lim _{K'(\omega )}\Delta ^k_p(\omega )=0\). It follows from the compactness hypothesis of Assumption 2 that there exists \(K(\omega )\subseteq K'(\omega )\) for which the subsequence \(\{X^k_{\text {inf}}(\omega )\}_{k\in K(\omega )}\) converges to a limit \({\hat{X}}_{\inf }(\omega )\). Specifically, \({\hat{X}}_{\inf }(\omega )\) is a refined point for the refining subsequence \(\{X^k_{\text {inf}}(\omega )\}_{k\in K(\omega )}\). Let \(v\in T^H_{{\mathcal {X}}}({\hat{X}}_{\inf }(\omega ))\) be a refining direction for \({\hat{X}}_{\inf }(\omega )\). Denote by V the random vector with realizations v, i.e., \(v=V(\omega )\), and let \({\hat{x}}_{\inf }={\hat{X}}_{\inf }(\omega )\), \(x^k_{\text {inf}}=X^k_{\text {inf}}(\omega )\), \(\delta ^k_p=\Delta ^k_p(\omega )\), \(\delta ^k_m=\Delta ^k_m(\omega )\), \(\psi _k^h=\Psi _k^h(\omega )\) and \({\mathcal {K}}=K(\omega )\). Since v is a refining direction, there exists \({\mathcal {L}}\subseteq {\mathcal {K}}\) and polling directions \(d^k\in {\mathbb {D}}^k_p(x^k_{\text {inf}})\) such that \(v=\underset{k\in {\mathcal {L}}}{\lim }\frac{d^k}{{\left\Vert d^k\right\Vert }_{\infty }}\). For each \(k\in {\mathcal {L}}\), define
where the fact that \(t_k\rightarrow 0\) follows from Definition 2.11, specifically the inequality \(\delta ^k_m{\left\Vert d^k\right\Vert }_{\infty }\le \delta ^k_pb\). Since h is \(\lambda ^h\)–locally Lipschitz,
which shows that Lemma 4.9 applies to both subsequences \(\{a_k\}_{k\in {\mathcal {L}}}\) and \(\{b_k\}_{k\in {\mathcal {L}}}\). Moreover, combining the inequality \(\lim _{{\mathcal {L}}}\psi _k^h\le 0\) and Assumption 6 (the fact that \(\delta ^k_p{\left\Vert d^k\right\Vert }_{\infty }\ge d_{\min }>0\)), yields
where the equality in (109) follows from \(\delta ^k_m=(\delta ^k_p)^2\) for sufficiently large k. Thus, by adding and subtracting \(h(x^k_{\text {inf}})\) to the numerator of the definition of the Clarke derivative, and using the fact that \(x^k_{\text {inf}}+\delta ^k_md^k\in {\mathcal {X}}\) for sufficiently large \(k\in {\mathcal {L}}\) since v is a hypertangent direction,
where the last inequality follows from (109). Every outcome \(\omega \) arbitrarily chosen in \(E_1\cap E_2\) therefore belongs to the event
thus implying that \(E_1\cap E_2\subseteq E_3\). Then the proof is complete since \({\mathbb {P}}\left( E_1\cap E_2\right) =1\).
\(\square \)
1.5 Proof of Corollary 4.11
Proof
The proof is almost identical to the proof of a similar result (Corollary 3.6) in [9]. Recall the sequence \(K'\) of random variables and the almost sure event \(E_1\cap E_2\) in the proof of Theorem 4.10 and let \(\omega \in E_1\cap E_2\). Following the latter proof, there exists \(K(\omega )\subseteq K'(\omega )\) such that \(\lim _{K(\omega )}X^k_{\text {inf}}(\omega )={\hat{X}}_{\inf }(\omega )={\hat{x}}_{\inf }\). Moreover, it follows from Theorem 4.10 that \(h^\circ ({\hat{x}}_{\inf }; v)=h^\circ ({\hat{X}}_{\inf }(\omega ); V(\omega ))\ge 0\) for a set of refining directions v which is dense in the closure \(\ {\text {cl}}\left( T^H_{{\mathcal {X}}}({\hat{x}}_{\inf })\right) \) of \(T^H_{{\mathcal {X}}}({\hat{x}}_{\inf })\). Then the proof is complete by noticing that \(\ {\text {cl}}\left( T^H_{{\mathcal {X}}}({\hat{x}}_{\inf })\right) =T^{Cl}_{{\mathcal {X}}}({\hat{x}}_{\inf })\) wherever \(T^H_{{\mathcal {X}}}({\hat{x}}_{\inf })\ne \emptyset \) [50], with \(T^{Cl}_{{\mathcal {X}}}({\hat{x}}_{\inf })\) denoting the Clarke tangent cone to \({\mathcal {X}}\) at \({\hat{x}}_{\inf }\). \(\square \)
1.6 Proof of Lemma 4.12
Proof
The proof is almost identical to those of Lemma 4.7 and a similar result in [11]. Hence, full details are not provided here again. Unless otherwise stated, all the sequences, events and constants considered are defined as in the proof of Lemma 4.7. The result is proved by contradiction and all that follows is conditioned on the almost sure event \(E_1\cap \{T<+\infty \}\). Assume that with nonzero probability there exists a random variable \({\mathcal {E}}''>0\) such that
Let \({t,}\ \{x^{k\vee t}_{\text {feas}}\}_{k\in {\mathbb {N}}}\), \(\{s^k\}_{k\in {\mathbb {N}}}\), \(\{\delta ^k_p\}_{k\in {\mathbb {N}}}\) and \(\epsilon ''>0\) be realizations of \({T,}\ \{X^{k\vee T}_{\text {feas}}\}_{k\in {\mathbb {N}}}\), \(\{S^k\}_{k\in {\mathbb {N}}}\), \(\{\Delta ^k_p\}_{k\in {\mathbb {N}}}\) and \({\mathcal {E}}''\), respectively for which \(\psi _k^{f,t}\ge \epsilon ''\) for all \(k\ge 0\). Let \({\bar{k}}_0\in {\mathbb {N}}^*\) be such that
The key element of the proof is to show that an iteration \(k\ge k_0:=\max \{{\bar{k}}_0,t\}\) for which the events \(I_k\) and \(J_k\) both occur cannot be Unsuccessful, and hence \(\{R_k\}\) is a submartingale.
It follows from (110) and (111) that
which implies that the iteration \(k\ge k_0\) of Algorithm 2 cannot be Unsuccessful. The rest of the proof follows that of Lemma 4.7. \(\square \)
1.7 Proof of Theorem 4.13
Proof
The proof follows from Corollary 4.4 and the assumption \(\lim _{k\in K}H^k_0(X^{k\vee T}_{\text {feas}})= 0\) almost surely, by observing that for any outcome \(\omega \) in the almost sure event
the inequalities
and the continuity of \(y\mapsto \left|y\right|\), yield
This means that
since h is nonnegative, where the first equality in (112) follows from the continuity of h in \({\mathcal {X}}\). Consequently,
\(\square \)
1.8 Proof of Theorem 4.14
Proof
First, \({\mathbb {P}}\left( {\hat{X}}_{\text {feas}}\in {\mathcal {D}}\right) =1\) follows from Theorem 4.13. The proof follows from that of Theorem 4.10, by replacing h by f, \({\hat{x}}_{\inf }={\hat{X}}_{\inf }(\omega )\) by \({\hat{x}}_{\text {feas}}={\hat{X}}_{\text {feas}}(\omega )\), \(x^k_{\text {inf}}=X^k_{\text {inf}}(\omega )\) by \(x^{k\vee t}_{\text {feas}}=X^{k\vee T}_{\text {feas}}(\omega )\), \(\psi _k^h=\Psi _k^h(\omega )\) by \(\psi _k^{f,t}=\Psi _k^{f,T}(\omega )\) with \(t=T(\omega )\) and \(T^H_{{\mathcal {X}}}(\cdot )\) by \(T^H_{{\mathcal {D}}}(\cdot )\), for \(\omega \) fixed and arbitrarily chosen in the almost sure event \(E_1\cap E_5\cap \{T<+\infty \}\), where
\(\square \)
1.9 Proof of Corollary 4.15
Proof
The proof is almost identical to the proof of a similar result (Corollary 3.4) in [9]. Let \(\omega \) be arbitrarily chosen in the almost sure event \(E_1\cap E_5\cap \{T<+\infty \}\). It follows from Theorem 4.14 that \(f^\circ ({\hat{x}}_{\text {feas}}; v)=f^\circ ({\hat{X}}_{\text {feas}}(\omega ); V(\omega ))\ge 0\) for a set of refining directions v which is dense in the closure of \(T^H_{{\mathcal {D}}}({\hat{x}}_{\text {feas}})\). Then the proof is complete by noticing that the closure of the hypertangent cone coincides with the Clarke tangent cone wherever the hypertangent cone is nonempty [9, 50]. \(\square \)
Rights and permissions
About this article
Cite this article
Dzahini, K.J., Kokkolaras, M. & Le Digabel, S. Constrained stochastic blackbox optimization using a progressive barrier and probabilistic estimates. Math. Program. 198, 675–732 (2023). https://doi.org/10.1007/s10107-022-01787-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10107-022-01787-7