Skip to main content
Log in

Constrained stochastic blackbox optimization using a progressive barrier and probabilistic estimates

  • Full Length Paper
  • Series A
  • Published:
Mathematical Programming Submit manuscript

Abstract

This work introduces the StoMADS-PB algorithm for constrained stochastic blackbox optimization, which is an extension of the mesh adaptive direct-search (MADS) method originally developed for deterministic blackbox optimization under general constraints. The values of the objective and constraint functions are provided by a noisy blackbox, i.e., they can only be computed with random noise whose distribution is unknown. As in MADS, constraint violations are aggregated into a single constraint violation function. Since all function values are numerically unavailable, StoMADS-PB uses estimates and introduces probabilistic bounds for the violation. Such estimates and bounds obtained from stochastic observations are required to be accurate and reliable with high, but fixed, probabilities. The proposed method, which allows intermediate infeasible solutions, accepts new points using sufficient decrease conditions and imposing a threshold on the probabilistic bounds. Using Clarke nonsmooth calculus and martingale theory, Clarke stationarity convergence results for the objective and the violation function are derived with probability one.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. \(n+1\) is the number of evaluations required to construct a linear interpolant or a simplex gradient [12] in \({\mathbb {R}}^n\) [14, 46].

  2. It is implicitly assumed without any loss of generality that \(a^k(x^k)\ge 1\).

  3. The use of \(\varepsilon _f\) instead of \(\varepsilon \) is favored in [11].

References

  1. Abramson, M.A., Audet, C., Dennis, J.E., Jr., Le Digabel, S.: OrthoMADS: a deterministic MADS instance with orthogonal directions. SIAM J. Optim. 20(2), 948–966 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  2. Alarie, S., Audet, C., Bouchet, P.-Y., Le Digabel, S.: Optimization of noisy blackboxes with adaptive precision. SIAM J. Optim. 31(4), 3127–3156 (2021)

    Article  MathSciNet  MATH  Google Scholar 

  3. Anderson, E.J., Ferris, M.C.: A direct search algorithm for optimization with noisy function evaluations. SIAM J. Optim. 11(3), 837–857 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  4. Angün, E., Kleijnen, J., den Hertog, D., Gürkan, G.: Response surface methodology with stochastic constraints for expensive simulation. J. Oper. Res. Soc. 60(6), 735–746 (2009)

    Article  MATH  Google Scholar 

  5. Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pac. J. Math. 16(1), 1–3 (1966)

    Article  MathSciNet  MATH  Google Scholar 

  6. Audet, C.: A survey on direct search methods for blackbox optimization and their applications. In: Pardalos, P.M., Rassias, T.M. (eds.) chapter 2 Mathematics Without Boundaries: Surveys in Interdisciplinary Research, pp. 31–56. Springer (2014)

  7. Audet, C., Dennis, J.E., Jr.: A pattern search filter method for nonlinear programming without derivatives. SIAM J. Optim. 14(4), 980–1010 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  8. Audet, C., Dennis, J.E., Jr.: Mesh adaptive direct search algorithms for constrained optimization. SIAM J. Optim. 17(1), 188–217 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  9. Audet, C., Dennis, J.E., Jr.: A progressive barrier for derivative-free nonlinear programming. SIAM J. Optim. 20(1), 445–472 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  10. Audet, C., Dennis, J.E., Jr., Le Digabel, S.: Parallel space decomposition of the mesh adaptive direct search algorithm. SIAM J. Optim. 19(3), 1150–1170 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  11. Audet, C., Dzahini, K.J., Kokkolaras, M., Le Digabel, S.: Stochastic mesh adaptive direct search for blackbox optimization using probabilistic estimates. Comput. Optim. Appl. 79(1), 1–34 (2021)

    Article  MathSciNet  MATH  Google Scholar 

  12. Audet, C., Hare, W.: Derivative-Free and Blackbox Optimization. Springer Series in Operations Research and Financial Engineering. Springer (2017)

  13. Audet, C., Ihaddadene, A., Le Digabel, S., Tribes, C.: Robust optimization of noisy blackbox problems using the Mesh Adaptive Direct Search algorithm. Optim. Lett. 12(4), 675–689 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  14. Audet, C., Le Digabel, S., Tribes, C.: The mesh adaptive direct search algorithm for granular and discrete variables. SIAM J. Optim. 29(2), 1164–1189 (2019)

    Article  MathSciNet  MATH  Google Scholar 

  15. Augustin, F., Marzouk, Y.M.: NOWPAC: A provably convergent derivative-free nonlinear optimizer with path-augmented constraints. Technical report, arXiv (2014)

  16. Augustin, F., Marzouk, Y.M.: A trust-region method for derivative-free nonlinear constrained stochastic optimization. Technical Report 1703.04156, arXiv (2017)

  17. Bandeira, A.S., Scheinberg, K., Vicente, L.N.: Convergence of trust-region methods based on probabilistic models. SIAM J. Optim. 24(3), 1238–1264 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  18. Barton, R.R., Ivey, J.S., Jr.: Nelder-Mead simplex modifications for simulation optimization. Manag. Sci. 42(7), 954–973 (1996)

    Article  MATH  Google Scholar 

  19. Bertsimas, D., Nohadani, O., Teo, K.M.: Nonconvex robust optimization for problems with constraints. Informs J. Comput. 22(1), 44–58 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  20. Bhattacharya, R.N., Waymire, E.C.: A basic course in probability theory, vol. 69. Springer (2007)

  21. Blanchet, J., Cartis, C., Menickelly, M., Scheinberg, K.: Convergence Rate Analysis of a Stochastic Trust Region Method via Submartingales. Informs J. Optim. 1(2), 92–119 (2019)

    Article  MathSciNet  Google Scholar 

  22. Chang, K.H.: Stochastic nelder-mead simplex method—a new globally convergent direct search method for simulation optimization. Eur. J. Oper. Res. 220(3), 684–694 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  23. Chen, R., Menickelly, M., Scheinberg, K.: Stochastic optimization using a trust-region method and random models. Math. Program. 169(2), 447–487 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  24. Chen, X., Wang, N.: Optimization of short-time gasoline blending scheduling problem with a DNA based hybrid genetic algorithm. Chem. Eng. Process. 49(10), 1076–1083 (2010)

    Article  Google Scholar 

  25. Clarke, F.H.: Optimization and Nonsmooth Analysis. John Wiley and Sons, New York (1983). Reissued in 1990 by SIAM Publications, Philadelphia, as Vol. 5 in the series Classics in Applied Mathematics

  26. Conn, A.R., Scheinberg, K., Vicente, L.N.: Introduction to Derivative-Free Optimization. MOS-SIAM Series on Optimization. SIAM, Philadelphia (2009)

  27. Curtis, F.E., Scheinberg, K.: Adaptive stochastic optimization: A framework for analyzing stochastic optimization algorithms. IEEE Signal Process. Mag. 37(5), 32–42 (2020)

    Article  Google Scholar 

  28. Curtis, F.E., Scheinberg, K., Shi, R.: A stochastic trust region algorithm based on careful step normalization. Informs J. Optim. 1(3), 200–220 (2019)

    Article  MathSciNet  Google Scholar 

  29. Diniz-Ehrhardt, M.A., Ferreira, D.G., Santos, S.A.: A pattern search and implicit filtering algorithm for solving linearly constrained minimization problems with noisy objective functions. Optim. Methods Softw. 34(4), 827–852 (2019)

    Article  MathSciNet  MATH  Google Scholar 

  30. Diniz-Ehrhardt, M.A., Ferreira, D.G., Santos, S.A.: Applying the pattern search implicit filtering algorithm for solving a noisy problem of parameter identification. Comput. Optim. Appl. 1–32 (2020)

  31. Dolan, E.D., Moré, J.J.: Benchmarking optimization software with performance profiles. Math. Program. 91(2), 201–213 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  32. Durrett, R.: Probability: Theory and Examples. Cambridge University Press (2010)

  33. Dzahini, K.J.: Expected complexity analysis of stochastic direct-search. Comput. Optim. Appl. 81(1), 179–200 (2022)

  34. Gratton, S., Royer, C.W., Vicente, L.N., Zhang, Z.: Direct search based on probabilistic feasible descent for bound and linearly constrained problems. Comput. Optim. Appl. 72(3), 525–559 (2019)

    Article  MathSciNet  MATH  Google Scholar 

  35. Hock, W., Schittkowski, K.: Test Examples for Nonlinear Programming Codes. Lecture Notes in Economics and Mathematical Systems, vol. 187. Springer (1981)

  36. Jahn, J.: Introduction to the Theory of Nonlinear Optimization. Springer (1994)

  37. Kitayama, S., Arakawa, M., Yamazaki, K.: Sequential approximate optimization using radial basis function network for engineering optimization. Optim. Eng. 12(4), 535–557 (2011)

    Article  MATH  Google Scholar 

  38. Klassen, K.J., Yoogalingam, R.: Improving performance in outpatient appointment services with a simulation optimization approach. Prod. Oper. Manag. 18(4), 447–458 (2009)

    Article  Google Scholar 

  39. Lacksonen, T.: Empirical comparison of search algorithms for discrete event simulation. Comput. Ind. Eng. 40(1–2), 133–148 (2001)

    Article  Google Scholar 

  40. Larson, J., Billups, S.C.: Stochastic derivative-free optimization using a trust region framework. Comput. Optim. Appl. 64(3), 619–645 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  41. Le Digabel, S., Wild, S.M.: A Taxonomy of Constraints in Simulation-Based Optimization. Technical Report G-2015-57, Les cahiers du GERAD (2015)

  42. Letham, B., Karrer, B., Ottoni, G., Bakshy, E.: Constrained Bayesian optimization with noisy experiments. Bayesian Anal. 14(2), 495–519 (2019)

    Article  MathSciNet  MATH  Google Scholar 

  43. Lukšan, L., Vlček, J.: Test problems for nonsmooth unconstrained and linearly constrained optimization. Technical Report V-798, ICS AS CR (2000)

  44. Mezura-Montes, E., Coello, C.A.: Useful Infeasible Solutions in Engineering Optimization with Evolutionary Algorithms. In: Proceedings of the 4th Mexican International Conference on Advances in Artificial Intelligence, MICAI’05, pp. 652–662, Springer (2005)

  45. Mockus, J.: Bayesian approach to global optimization: theory and applications, volume 37 of Mathematics and Its Applications. Springer (2012)

  46. Moré, J.J., Wild, S.M.: Benchmarking derivative-free optimization algorithms. SIAM J. Optim. 20(1), 172–191 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  47. Nelder, J.A., Mead, R.: A simplex method for function minimization. Comput. J. 7(4), 308–313 (1965)

    Article  MathSciNet  MATH  Google Scholar 

  48. Paquette, C., Scheinberg, K.: A stochastic line search method with expected complexity analysis. SIAM J. Optim. 30(1), 349–376 (2020)

    Article  MathSciNet  MATH  Google Scholar 

  49. Robbins, H., Monro, S.: A stochastic approximation method. Ann. Math. Stat. 22(3), 400–407 (1951)

    Article  MathSciNet  MATH  Google Scholar 

  50. Rockafellar, R.T.: Generalized directional derivatives and subgradients of nonconvex functions. Canad. J. Math. 32(2), 257–280 (1980)

    Article  MathSciNet  MATH  Google Scholar 

  51. Rodríguez, J.F., Renaud, J.E., Watson, L.T.: Trust region augmented lagrangian methods for sequential response surface approximation and optimization. J. Mech. Des. 120(1), 58–66 (1998)

    Article  Google Scholar 

  52. Shashaani, S., Hashemi, F.S., Pasupathy, R.: ASTRO-DF: a class of adaptive sampling trust-region algorithms for derivative-free stochastic optimization. SIAM J. Optim. 28(4), 3145–3176 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  53. Tao, J., Wang, N.: DNA double helix based hybrid GA for the gasoline blending recipe optimization problem. Chem. Eng. Technol. 31(3), 440–451 (2008)

    Article  Google Scholar 

  54. Wang, Z., Ierapetritou, M.: Constrained optimization of black-box stochastic systems using a novel feasibility enhanced Kriging-based method. Comput. Chem. Eng. 118, 210–223 (2018)

    Article  Google Scholar 

  55. Zhao, J., Wang, N.: A bio-inspired algorithm based on membrane computing and its application to gasoline blending scheduling. Comput. Chem. Eng. 35(2), 272–283 (2011)

    Article  Google Scholar 

Download references

Acknowledgements

The authors are grateful to Charles Audet from Polytechnique Montréal for valuable discussions and constructive suggestions. This work is supported by the NSERC CRD RDCPJ 490744-15 grant and by an InnovÉÉ grant, both in collaboration with Hydro-Québec and Rio Tinto, and by a FRQNT fellowship.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kwassi Joseph Dzahini.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

This appendix presents the proofs of a series of results stated in Sect. 4.

1.1 Proof of Theorem 4.2

Proof

This theorem is proved using ideas from [11, 21, 23, 33, 40, 48] and conditioning on the disjoint events \(\left\{ T=+\infty \right\} \) and \(\left\{ T<+\infty \right\} \) that are almost sure due to Assumption 4. The proof considers two different parts. Part 1 considers two separate cases conditioned on the event \(\left\{ T=+\infty \right\} \) (i.e., no \(\varepsilon \)-feasible point is found by Algorithm 2): “good bounds” and “bad bounds”, each of which is separated into whether an iteration is h-Dominating, Improving or Unsuccessful. Part 2 considers three separates cases conditioned on the event \(\left\{ T<+\infty \right\} \): “good estimates and good bounds”, “bad estimates and good bounds” and “bad bounds”, each of which is separated into whether an iteration is f-Dominating, h-Dominating, Improving or Unsuccessful.

In order to show (21), the goal of Part 1 is to show that there exists a constant \(\eta >0\) such that conditioned on the almost sure event \(\{T=+\infty \}\), the following holds for all \(k\in {\mathbb {N}}\):

$$\begin{aligned} {\mathbb {E}}\left( \Phi _{k+1}-\Phi _k|{\mathcal {F}}^{C\cdot F}_{k-1}\right) \le -\eta (\Delta ^k_p)^2, \end{aligned}$$
(49)

where \(\Phi _k\) is the random function defined by

$$\begin{aligned} \Phi _k:=\frac{\nu }{m\varepsilon }h(X^k_{\text {inf}})+(1-\nu )(\Delta ^k_p)^2, \quad \text {for all}\ k\in {\mathbb {N}}. \end{aligned}$$
(50)

Indeed, assume that (49) holds. Since \(\Phi _k>0\) for all \(k\in {\mathbb {N}}\), then summing (49) over \(k\in {\mathbb {N}}\) and taking expectations on both sides lead to

$$\begin{aligned} {\mathbb {E}}\left[ \sum _{k=0}^{+\infty }(\Delta ^k_p)^2 \right] \le \frac{{\mathbb {E}}\left( \Phi _0\right) }{\eta } =\frac{\Phi _0}{\eta }, \end{aligned}$$
(51)

That is, (21) holds. Then, Part 2 aims to show that for the same previous constant \(\eta \), conditioned on the almost sure event \(\{T<+\infty \}\) and making use of the following random function

$$\begin{aligned} \Phi _k^T:=\frac{\nu }{\varepsilon }(f(X^{k\vee T}_{\text {feas}})-\kappa ^f_{\min }) +\frac{\nu }{m\varepsilon }h(X^k_{\text {inf}})+(1-\nu )(\Delta ^k_p)^2, \quad \text {for all}\ k\in {\mathbb {N}}, \end{aligned}$$
(52)

where \(k\vee T:=\max \{k,T\}\), the following holds for all \(k\in {\mathbb {N}}\):

$$\begin{aligned} {\mathbb {E}}\left( \Phi _{k+1}^T-\Phi _k^T|{\mathcal {F}}^{C\cdot F}_{k-1}\right) \le -\eta (\Delta ^k_p)^2. \end{aligned}$$
(53)

Indeed, assume that (53) holds. Since \(\Phi _k^T>0\) for all \(k\ge 0\), then summing (53) over \(k\in {\mathbb {N}}\) and taking expectations on both sides, yields

$$\begin{aligned} \begin{aligned} {\mathbb {E}}\left[ \sum _{k=0}^{+\infty }(\Delta ^k_p)^2 \right]&\le \frac{{\mathbb {E}}\left( \Phi _0^T\right) }{\eta } =\frac{1}{\eta }\left[ \frac{\nu }{\varepsilon }\left( {\mathbb {E}}\left[ f(X^T_{\text {feas}}) \right] -\kappa ^f_{\min }\right) +\frac{\nu }{m\varepsilon }h(x^0_{\text {inf}})+(1-\nu )(\delta ^0_p)^2 \right] \\&\le \frac{1}{\eta }\left[ \frac{\nu }{\varepsilon }\left( \kappa ^f_{\max }-\kappa ^f_{\min } \right) +\frac{\nu }{m\varepsilon }h(x^0_{\text {inf}})+(1-\nu )(\delta ^0_p)^2 \right] =:\mu , \end{aligned}\nonumber \\ \end{aligned}$$
(54)

where the last inequality in (54) follows from the inequality \(f(X^k_{\text {feas}})\le \kappa ^f_{\max }\) for all \(k\ge 0\), due to Proposition 3.5, and the fact that T is finite almost surely.

The remainder of the proof is devoted to showing that (49) and (53) hold. The following events are introduced for the sake of clarity in the analysis.

$$\begin{aligned} {\mathcal {D}}_f&:= \{\text {The iteration is }f\text {-Dominating}\}, \quad {\mathcal {D}}_h {:=}~ \{\text {The iteration is }h\text {-Dominating}\},\\ {\mathcal {I}}&:= \{\text {The iteration is Improving}\},\quad \quad \quad \quad {\mathcal {U}}{:=}~\{\text {The iteration is Unsuccessful}\}. \end{aligned}$$

Part 1 (\(\varvec{T=+\infty }\) almost surely). The random function \(\Phi _k\) defined in (50) will be shown to satisfy (49) with \(\eta =\frac{1}{2}\alpha \beta (1-\nu )(1-\tau ^2)\), regardless of the change in the objective function f on the \(\varepsilon \)-infeasible incumbents encountered by Algorithm 2. Moreover, since T is infinite almost surely, then no iteration of Algorithm 2 can be f-Dominating. Two separate cases are distinguished and all that follows is conditioned on the almost sure event \(\{T=+\infty \}\).

Case 1 (Good bounds, \(\varvec{\mathbb {1}_{I_k}=1}\)). No matter the type of iteration which occurs, the random function \(\Phi _k\) will be shown to decrease and the smallest decrease is shown to happen on Unsuccessful iterations, thus yielding

$$\begin{aligned} {\mathbb {E}}\left[ \mathbb {1}_{I_k}(\Phi _{k+1}-\Phi _k)|{\mathcal {F}}^{C\cdot F}_{k-1}\right] \le -\alpha (1-\nu )(1-\tau ^2)(\Delta ^k_p)^2. \end{aligned}$$
(55)
  1. (i)

    The iteration is h-Dominating (\(\mathbb {1}_{{\mathcal {D}}_h}=1\)). The iteration is h-Dominating and the bounds are good, so a decrease occurs in h according to (6), i.e.,

    $$\begin{aligned} \mathbb {1}_{I_k}\mathbb {1}_{{\mathcal {D}}_h}\frac{\nu }{m\varepsilon } (h(X^{k+1}_{\text {inf}})-h(X^k_{\text {inf}}))\le -\mathbb {1}_{I_k}\mathbb {1}_{{\mathcal {D}}_h}\nu (\gamma -2)(\Delta ^k_p)^2{.} \end{aligned}$$
    (56)

    The frame size parameter is updated according to \(\Delta ^{k+1}_p=\min \{\tau ^{-1}\Delta ^k_p,\delta _{\max }\}\), which implies that

    $$\begin{aligned} \mathbb {1}_{I_k}\mathbb {1}_{{\mathcal {D}}_h}(1-\nu )[(\Delta ^{k+1}_p)^2-(\Delta ^k_p)^2]\le \mathbb {1}_{I_k}\mathbb {1}_{{\mathcal {D}}_h}(1-\nu )(\tau ^{-2}-1)(\Delta ^k_p)^2. \end{aligned}$$
    (57)

    Then, by choosing \(\nu \) according to (19), the right-hand side of (56) dominates that of (57). That is,

    $$\begin{aligned} -\nu (\gamma -2)(\Delta ^k_p)^2 + (1-\nu )(\tau ^{-2}-1)(\Delta ^k_p)^2 \le -\frac{1}{2}\nu (\gamma -2)(\Delta ^k_p)^2. \end{aligned}$$
    (58)

    Combining (56), (57) and (58) leads to

    $$\begin{aligned} \mathbb {1}_{I_k}\mathbb {1}_{{\mathcal {D}}_h}(\Phi _{k+1}-\Phi _k)\le -\mathbb {1}_{I_k}\mathbb {1}_{{\mathcal {D}}_h}\frac{1}{2}\nu (\gamma -2)(\Delta ^k_p)^2. \end{aligned}$$
    (59)
  2. (ii)

    The iteration is Improving (\(\mathbb {1}_{{\mathcal {I}}}=1\)). The iteration is Improving and the bounds are good, so again, a decrease occurs in h according to (6). Moreover, \(\Delta ^k_p\) is updated as in h-Dominating iterations. Thus, the change in \(\Phi _k\) follows from (59) by replacing \(\mathbb {1}_{{\mathcal {D}}_h}\) by \(\mathbb {1}_{{\mathcal {I}}}\). Specifically,

    $$\begin{aligned} \mathbb {1}_{I_k}\mathbb {1}_{{\mathcal {I}}}(\Phi _{k+1}-\Phi _k)\le -\mathbb {1}_{I_k}\mathbb {1}_{{\mathcal {I}}}\frac{1}{2}\nu (\gamma -2)(\Delta ^k_p)^2. \end{aligned}$$
    (60)
  3. (iii)

    The iteration is Unsuccessful (\(\mathbb {1}_{{\mathcal {U}}}=1\)). The value of h is unchanged while the frame size parameter is decreased. Consequently,

    $$\begin{aligned} \mathbb {1}_{I_k}\mathbb {1}_{{\mathcal {U}}}(\Phi _{k+1}-\Phi _k)=-\mathbb {1}_{I_k}\mathbb {1}_{{\mathcal {U}}}(1-\nu )(1-\tau ^{2})(\Delta ^k_p)^2 \end{aligned}$$
    (61)

    Because \(\nu \) satisfies (19) and because \(1-\tau ^2<\tau ^{-2}-1\), Unsuccessful iterations, vis a vis (61), provide the worst case decrease when compared to (59) and (60). That is,

    $$\begin{aligned} -\frac{1}{2}\nu (\gamma -2)(\Delta ^k_p)^2\le -(1-\nu )(1-\tau ^{2})(\Delta ^k_p)^2. \end{aligned}$$
    (62)

    Thus, it follows from (59), (60), (61) and (62) that the change in \(\Phi _k\) is bounded like

    $$\begin{aligned} \mathbb {1}_{I_k}(\Phi _{k+1}-\Phi _k)= & {} \mathbb {1}_{I_k}(\mathbb {1}_{{\mathcal {D}}_h}+\mathbb {1}_{{\mathcal {I}}}+\mathbb {1}_{{\mathcal {U}}})(\Phi _{k+1}-\Phi _k)\nonumber \\\le & {} -\mathbb {1}_{I_k}(1-\nu )(1-\tau ^{2})(\Delta ^k_p)^2. \end{aligned}$$
    (63)

Since Assumption 3 holds, taking conditional expectations with respect to \({\mathcal {F}}^{C\cdot F}_{k-1}\) on both sides of the inequality in (63) leads to (55).

Case 2 (Bad bounds, \(\varvec{\mathbb {1}_{\bar{I_k}}=1}\)). Since the bounds are bad, Algorithm 2 can accept a step which leads to an increase in h and \(\Delta ^k_p\), and hence in \(\Phi _k\). Such an increase in \(\Phi _k\) is controlled by making use of (15). Then, the probability of \(\bar{I_k}\) is chosen to be sufficiently small so that \(\Phi _k\) can be reduced sufficiently in expectation. More precisely, the next result will be proved

$$\begin{aligned} {\mathbb {E}}\left[ \mathbb {1}_{\bar{I_k}}(\Phi _{k+1}-\Phi _k)|{\mathcal {F}}^{C\cdot F}_{k-1}\right] \le 2\nu (1-\alpha )^{1/2}(\Delta ^k_p)^2. \end{aligned}$$
(64)
  1. (i)

    The iteration is h-Dominating (\(\mathbb {1}_{{\mathcal {D}}_h}=1\)). The change in h is bounded like

    $$\begin{aligned}&\mathbb {1}_{\bar{I_k}}\mathbb {1}_{{\mathcal {D}}_h}\frac{\nu }{m\varepsilon } (h(X^{k+1}_{\text {inf}})-h(X^k_{\text {inf}})) \nonumber \\&\quad \le \mathbb {1}_{\bar{I_k}}\mathbb {1}_{{\mathcal {D}}_h}\frac{\nu }{m\varepsilon } \left[ (H^k_s-H^k_0)+\left|h(X^{k+1}_{\text {inf}}) -H^k_s\right|+\left|h(X^k_{\text {inf}})-H^k_0\right|\right] \nonumber \\&\quad \le \mathbb {1}_{\bar{I_k}}\mathbb {1}_{{\mathcal {D}}_h}\nu \left[ -\gamma (\Delta ^k_p)^2+\frac{1}{m\varepsilon }\left( \left|h(X^{k+1}_{\text {inf}}) -H^k_s\right|+\left|h(X^k_{\text {inf}})-H^k_0\right|\right) \right] , \end{aligned}$$
    (65)

    where (65) follows from \(H^k_s-H^k_0\le -\gamma m\varepsilon (\Delta ^k_p)^2 \) which is satisfied in every h-Dominating iteration. Moreover, the change in \(\Delta ^k_p\) can be obtained simply by replacing \(\mathbb {1}_{I_k}\) by \(\mathbb {1}_{\bar{I_k}}\) in (57). That is,

    $$\begin{aligned} \mathbb {1}_{\bar{I_k}}\mathbb {1}_{{\mathcal {D}}_h}(1-\nu )[(\Delta ^{k+1}_p)^2-(\Delta ^k_p)^2]\le \mathbb {1}_{\bar{I_k}}\mathbb {1}_{{\mathcal {D}}_h}(1-\nu )(\tau ^{-2}-1)(\Delta ^k_p)^2. \end{aligned}$$
    (66)

    Because \(\nu \) satisfies (19), \(-\nu \gamma (\Delta ^k_p)^2+(1-\nu )(\tau ^{-2}-1)(\Delta ^k_p)^2\le 0\). Combining (65) and (66),

    $$\begin{aligned} \mathbb {1}_{\bar{I_k}}\mathbb {1}_{{\mathcal {D}}_h}(\Phi _{k+1}-\Phi _k)\le \mathbb {1}_{\bar{I_k}}\mathbb {1}_{{\mathcal {D}}_h}\frac{\nu }{m\varepsilon } \left( \left|h(X^{k+1}_{\text {inf}})-H^k_s\right|+\left|h(X^k_{\text {inf}})-H^k_0\right|\right) .\nonumber \\ \end{aligned}$$
    (67)
  2. (ii)

    The iteration is Improving (\(\mathbb {1}_{{\mathcal {I}}}=1\)). \(\Delta ^k_p\) is updated as in h-Dominating iterations. The increase in h is bounded as in (65). Thus, the bound on the change in \(\Phi _k\) can be obtained by replacing \(\mathbb {1}_{{\mathcal {D}}_h}\) by \(\mathbb {1}_{{\mathcal {I}}}\) in (67). That is,

    $$\begin{aligned} \mathbb {1}_{\bar{I_k}}\mathbb {1}_{{\mathcal {I}}}(\Phi _{k+1}-\Phi _k)\le \mathbb {1}_{\bar{I_k}}\mathbb {1}_{{\mathcal {I}}}\frac{\nu }{m\varepsilon } \left( \left|h(X^{k+1}_{\text {inf}})-H^k_s\right|+\left|h(X^k_{\text {inf}})-H^k_0\right|\right) .\nonumber \\ \end{aligned}$$
    (68)
  3. (iii)

    The iteration is Unsuccessful (\(\mathbb {1}_{{\mathcal {U}}}=1\)). The value of h is unchanged and \(\Delta ^k_p\) is decreased. Thus, the change in \(\Phi _k\) follows from (61) by replacing \(\mathbb {1}_{I_k}\) by \(\mathbb {1}_{\bar{I_k}}\) and is trivially bounded like

    $$\begin{aligned} \mathbb {1}_{\bar{I_k}}\mathbb {1}_{{\mathcal {U}}}(\Phi _{k+1}-\Phi _k) \le \mathbb {1}_{\bar{I_k}}\mathbb {1}_{{\mathcal {U}}}\frac{\nu }{m\varepsilon } \left( \left|h(X^{k+1}_{\text {inf}})-H^k_s\right|+\left|h(X^k_{\text {inf}})-H^k_0\right|\right) .\nonumber \\ \end{aligned}$$
    (69)

    It follows from (67), (68), (69) and the inequality \(\mathbb {1}_{\bar{I_k}}\le 1\), that

    $$\begin{aligned} \mathbb {1}_{\bar{I_k}}(\Phi _{k+1}-\Phi _k) \le \frac{\nu }{m\varepsilon }\left( \left|h(X^{k+1}_{\text {inf}})-H^k_s\right|+\left|h(X^k_{\text {inf}})-H^k_0\right|\right) {.} \end{aligned}$$
    (70)

    Taking conditional expectations with respect to \({\mathcal {F}}^{C\cdot F}_{k-1}\) on both sides of (70) and using the inequalities (15) of Lemma 3.7, leads to (64).

Combining (55) and (64) yields,

$$\begin{aligned} {\mathbb {E}}\left( \Phi _{k+1}-\Phi _k|{\mathcal {F}}^{C\cdot F}_{k-1}\right)= & {} {\mathbb {E}}\left[ (\mathbb {1}_{I_k}+\mathbb {1}_{\bar{I_k}})(\Phi _{k+1}-\Phi _k)|{\mathcal {F}}^{C\cdot F}_{k-1}\right] \nonumber \\\le & {} \left[ -\alpha (1-\nu )(1-\tau ^2)+ 2\nu (1-\alpha )^{1/2}\right] (\Delta ^k_p)^2. \end{aligned}$$
(71)

Choosing \(\alpha \) according to (20) implies that \(\displaystyle {\alpha \ge \frac{4\nu (1-\alpha )^{1/2}}{(1-\nu )(1-\tau ^2)}}\), which ensures

$$\begin{aligned} -\alpha (1-\nu )(1-\tau ^2)+ 2\nu (1-\alpha )^{1/2}\le & {} -\frac{1}{2}\alpha (1-\nu )(1-\tau ^2)\nonumber \\\le & {} -\frac{1}{2}\alpha \beta (1-\nu )(1-\tau ^2). \end{aligned}$$
(72)

Thus, (49) follows from (71) and (72) with \(\eta =\frac{1}{2}\alpha \beta (1-\nu )(1-\tau ^2)\).

Part 2 (\(\varvec{T<+\infty }\) almost surely). In order to show that the random function \(\Phi _k^T\) defined by

$$\begin{aligned} \Phi _k^T=\frac{\nu }{\varepsilon }(f(X^{k\vee T}_{\text {feas}})-\kappa ^f_{\min }) +\frac{\nu }{m\varepsilon }h(X^k_{\text {inf}})+(1-\nu )(\Delta ^k_p)^2 \end{aligned}$$

satisfies (53) with the same constant \(\eta \) derived in Part 1, notice that whenever the event \(\{T>k\}\) occurs, then \(f(X^{(k+1)\vee T}_{\text {feas}})-f(X^{k\vee T}_{\text {feas}})=0\) since \(\max \{k,T\}:=k\vee T=(k+1)\vee T=T\). Thus, on the event \(\{T>k\}\), the random function \(\Phi _k\) used in Part 1 has the same increment as \(\Phi _k^T\). Specifically,

$$\begin{aligned} \mathbb {1}_{\{T<+\infty \}}\mathbb {1}_{\{T> k\}}(\Phi _{k+1}^T-\Phi _k^T)=\mathbb {1}_{\{T<+\infty \}}\mathbb {1}_{\{T> k\}}(\Phi _{k+1}-\Phi _k). \end{aligned}$$

Moreover, it follows from the definition of the stopping time T that no iteration can be f-Dominating when the event \(\{T>k\}\) occurs. Consequently, it easily follows from the analysis in Part 1 and the fact that the random variable \(\mathbb {1}_{\{T> k\}}\) is \({\mathcal {F}}^{C\cdot F}_{k-1}\)-measurable that,

$$\begin{aligned} \mathbb {1}_{\{T> k\}}{\mathbb {E}}\left( \Phi _{k+1}^T-\Phi _k^T|{\mathcal {F}}^{C\cdot F}_{k-1}\right) \le -\eta (\Delta ^k_p)^2\mathbb {1}_{\{T> k\}}. \end{aligned}$$
(73)

The remainder of the proof is devoted to showing that

$$\begin{aligned} \mathbb {1}_{\{T\le k\}}{\mathbb {E}}\left( \Phi _{k+1}^T-\Phi _k^T|{\mathcal {F}}^{C\cdot F}_{k-1}\right) \le -\eta (\Delta ^k_p)^2\mathbb {1}_{\{T\le k\}}, \end{aligned}$$
(74)

since combining (73) and (74) leads to (53), which is the overall goal. In all that follows, it is assumed that the event \(\{T\le k\}\) occurs.

Case 1 (Good estimates and good bounds, \(\varvec{\mathbb {1}_{I_k}\mathbb {1}_{J_k}=1}\)). Regardless of the iteration type, the smallest decrease in \(\Phi _k^T\) will be shown to happen on Unsuccessful iterations, and it will be shown that

$$\begin{aligned} \mathbb {1}_{\{T\le k\}}{\mathbb {E}}\left[ \mathbb {1}_{I_k}\mathbb {1}_{J_k}(\Phi _{k+1}^T-\Phi _k^T)|{\mathcal {F}}^{C\cdot F}_{k-1}\right] \le -\alpha \beta (1-\nu )(1-\tau ^2)(\Delta ^k_p)^2\mathbb {1}_{\{T\le k\}}.\nonumber \\ \end{aligned}$$
(75)
  1. (i)

    The iteration is f-Dominating (\(\mathbb {1}_{{\mathcal {D}}_f}=1\)). The iteration is f-Dominating and the estimates are good, so a decrease occurs in f according to (10). That is,

    $$\begin{aligned}&\mathbb {1}_{\{T\le k\}}\mathbb {1}_{I_k}\mathbb {1}_{J_k}\mathbb {1}_{{\mathcal {D}}_f}\frac{\nu }{\varepsilon }(f(X^{(k+1)\vee T}_{\text {feas}})-f(X^{k\vee T}_{\text {feas}}))\nonumber \\&\quad \le -\mathbb {1}_{\{T\le k\}}\mathbb {1}_{I_k}\mathbb {1}_{J_k}\mathbb {1}_{{\mathcal {D}}_f}\nu (\gamma -2)(\Delta ^k_p)^2. \end{aligned}$$
    (76)

    Since the \(\varepsilon \)-infeasible incumbent is not updated, The value of h is unchanged. The frame size parameter is updated according to \(\Delta ^{k+1}_p=\min \{\tau ^{-1}\Delta ^k_p,\delta _{\max }\}\), thus implying that

    $$\begin{aligned}&\mathbb {1}_{\{T\le k\}}\mathbb {1}_{I_k}\mathbb {1}_{J_k}\mathbb {1}_{{\mathcal {D}}_f}(1-\nu )[(\Delta ^{k+1}_p)^2-(\Delta ^k_p)^2]\nonumber \\&\quad \le \mathbb {1}_{\{T\le k\}}\mathbb {1}_{I_k}\mathbb {1}_{J_k}\mathbb {1}_{{\mathcal {D}}_f}(1-\nu )(\tau ^{-2}-1)(\Delta ^k_p)^2. \end{aligned}$$
    (77)

    Because \(\nu \) satisfies (19), (58) holds, which implies that the right-hand side of (76) dominates that of (77), leading to the inequality

    $$\begin{aligned} \mathbb {1}_{\{T\le k\}}\mathbb {1}_{I_k}\mathbb {1}_{J_k}\mathbb {1}_{{\mathcal {D}}_f}(\Phi _{k+1}^T-\Phi _k^T) \le -\mathbb {1}_{\{T\le k\}}\mathbb {1}_{I_k}\mathbb {1}_{J_k}\mathbb {1}_{{\mathcal {D}}_f}\frac{1}{2} \nu (\gamma -2)(\Delta ^k_p)^2.\nonumber \\ \end{aligned}$$
    (78)
  2. (ii)

    The iteration is h-Dominating (\(\mathbb {1}_{{\mathcal {D}}_h}=1\)). The value of f is unchanged since \(X^k_{\text {feas}}\) is not updated. Thus, the bound on the change in \(\Phi _k^T\) follows from multiplying both sides of (59) by \(\mathbb {1}_{\{T\le k\}}\mathbb {1}_{J_k}\), and replacing \(\Phi _k\) by \(\Phi _k^T\). That is,

    $$\begin{aligned}&\mathbb {1}_{\{T\le k\}}\mathbb {1}_{I_k}\mathbb {1}_{J_k}\mathbb {1}_{{\mathcal {D}}_h}(\Phi _{k+1}^T-\Phi _k^T) \le -\mathbb {1}_{\{T\le k\}}\mathbb {1}_{I_k}\mathbb {1}_{J_k}\mathbb {1}_{{\mathcal {D}}_h}\frac{1}{2}\nu (\gamma -2)(\Delta ^k_p)^2.\nonumber \\ \end{aligned}$$
    (79)
  3. (iii)

    The iteration is Improving (\(\mathbb {1}_{{\mathcal {I}}}=1\)). Again, the value of f is unchanged. Thus, the bound on the change in \(\Phi _k^T\) follows from multiplying both sides of (60) by \(\mathbb {1}_{\{T\le k\}}\mathbb {1}_{J_k}\), and replacing \(\Phi _k\) by \(\Phi _k^T\). That is,

    $$\begin{aligned}&\mathbb {1}_{\{T\le k\}}\mathbb {1}_{I_k}\mathbb {1}_{J_k}\mathbb {1}_{{\mathcal {I}}}(\Phi _{k+1}^T-\Phi _k^T) \le -\mathbb {1}_{\{T\le k\}}\mathbb {1}_{I_k}\mathbb {1}_{J_k}\mathbb {1}_{{\mathcal {I}}}\frac{1}{2}\nu (\gamma -2)(\Delta ^k_p)^2.\nonumber \\ \end{aligned}$$
    (80)
  4. (iv)

    The iteration is Unsuccessful (\(\mathbb {1}_{{\mathcal {U}}}=1\)). The value of f and h is unchanged since no incumbent is updated, while \(\Delta ^k_p\) is decreased. Consequently, the bound on the change in \(\Phi _k^T\) follows from multiplying both sides of (61) by \(\mathbb {1}_{\{T\le k\}}\mathbb {1}_{J_k}\), and replacing \(\Phi _k\) by \(\Phi _k^T\). That is,

    $$\begin{aligned} \mathbb {1}_{\{T\le k\}}\mathbb {1}_{I_k}\mathbb {1}_{J_k}\mathbb {1}_{{\mathcal {U}}}(\Phi _{k+1}^T-\Phi _k^T)=-\mathbb {1}_{\{T\le k\}}\mathbb {1}_{I_k}\mathbb {1}_{J_k}\mathbb {1}_{{\mathcal {U}}}(1-\nu )(1-\tau ^{2})(\Delta ^k_p)^2.\nonumber \\ \end{aligned}$$
    (81)

    Combining (78), (79), (80), (81) and (62) yields

    $$\begin{aligned} \mathbb {1}_{\{T\le k\}}\mathbb {1}_{I_k}\mathbb {1}_{J_k}(\Phi _{k+1}^T-\Phi _k^T) \le -\mathbb {1}_{\{T\le k\}}\mathbb {1}_{I_k}\mathbb {1}_{J_k}(1-\nu )(1-\tau ^2)(\Delta ^k_p)^2. \end{aligned}$$
    (82)

The following holds under Assumption 3: \({\mathbb {E}}\left( \mathbb {1}_{I_k}\mathbb {1}_{J_k}|{\mathcal {F}}^{C\cdot F}_{k-1}\right) \ge \alpha \beta \). Then, taking expectations with respect to \({\mathcal {F}}^{C\cdot F}_{k-1}\) on both sides of (82) and using the \({\mathcal {F}}^{C\cdot F}_{k-1}\)-measurability of the random variables \(\mathbb {1}_{\{T\le k\}}\) and \(\Delta ^k_p\) leads to (75).

Case 2 (Bad estimates and good bounds, \(\varvec{\mathbb {1}_{I_k}\mathbb {1}_{\bar{J_k}}=1}\)). An increase in the difference of \(\Phi _k^T\) may occur since good bounds might not provide enough decrease to cancel the increase which occurs in f whenever Algorithm 2 wrongly accepts an incumbent due to bad estimates. Specifically, the f-Dominating case dominates the worst-case increase in the change of \(\Phi _k^T\), leading to

$$\begin{aligned} \mathbb {1}_{\{T\le k\}}{\mathbb {E}}\left[ \mathbb {1}_{I_k}\mathbb {1}_{\bar{J_k}}(\Phi _{k+1}^T-\Phi _k^T)|{\mathcal {F}}^{C\cdot F}_{k-1}\right] \le 2\nu (1-\beta )^{1/2}(\Delta ^k_p)^2\mathbb {1}_{\{T\le k\}}. \end{aligned}$$
(83)
  1. (i)

    The iteration is f-Dominating (\(\mathbb {1}_{{\mathcal {D}}_f}=1\)). Whenever bad estimates occur and the iteration is f-Dominating, the change in f is bounded like

    $$\begin{aligned} \begin{aligned}&\mathbb {1}_{\{T\le k\}} \mathbb {1}_{I_k}\mathbb {1}_{\bar{J_k}}\mathbb {1}_{{\mathcal {D}}_f}\frac{\nu }{\varepsilon }(f(X^{(k+1)\vee T}_{\text {feas}})-f(X^{k\vee T}_{\text {feas}}))\\&\quad \le \mathbb {1}_{\{T\le k\}}\mathbb {1}_{I_k}\mathbb {1}_{\bar{J_k}}\mathbb {1}_{{\mathcal {D}}_f}\frac{\nu }{\varepsilon } \left[ (F^k_s-F^k_0) +\left|f(X^{k+1}_{\text {feas}})-F^k_s\right|+\left|f(X^k_{\text {feas}})-F^k_0\right|\right] \\&\quad \le \mathbb {1}_{\{T\le k\}}\mathbb {1}_{I_k}\mathbb {1}_{\bar{J_k}}\mathbb {1}_{{\mathcal {D}}_f}\nu \left[ -\gamma (\Delta ^k_p)^2 +\frac{1}{\varepsilon }\left( \left|f(X^{k+1}_{\text {feas}})-F^k_s\right|+\left|f(X^k_{\text {feas}})-F^k_0\right|\right) \right] \end{aligned}\nonumber \\ \end{aligned}$$
    (84)

    where the last inequality in (84) follows from \(F^k_s-F^k_0\le -\gamma \varepsilon (\Delta ^k_p)^2 \) which is satisfied for every f-Dominating iteration. While the value of h remains unchanged since \(X^k_{\text {inf}}\) is not updated, the change in \(\Delta ^k_p\) follows (77) by replacing \(\mathbb {1}_{J_k}\) by \(\mathbb {1}_{\bar{J_k}}\). That is,

    $$\begin{aligned}&\mathbb {1}_{\{T\le k\}}\mathbb {1}_{I_k}\mathbb {1}_{\bar{J_k}}\mathbb {1}_{{\mathcal {D}}_f}(1-\nu )[(\Delta ^{k+1}_p)^2-(\Delta ^k_p)^2] \le \mathbb {1}_{\{T\le k\}}\mathbb {1}_{I_k}\mathbb {1}_{\bar{J_k}}\mathbb {1}_{{\mathcal {D}}_f}(1-\nu )(\tau ^{-2}-1)(\Delta ^k_p)^2. \nonumber \\ \end{aligned}$$
    (85)

    Then, (84), (85), (19) and the inequality \(-\nu \gamma (\Delta ^k_p)^2+(1-\nu )(\tau ^{-2}-1)(\Delta ^k_p)^2\le 0\) yield

    $$\begin{aligned} \begin{aligned}&\mathbb {1}_{\{T\le k\}}\mathbb {1}_{I_k}\mathbb {1}_{\bar{J_k}}\mathbb {1}_{{\mathcal {D}}_f}( \Phi _{k+1}^T-\Phi _k^T)\\&\quad \le \mathbb {1}_{\{T\le k\}}\mathbb {1}_{I_k}\mathbb {1}_{\bar{J_k}}\mathbb {1}_{{\mathcal {D}}_f}\frac{\nu }{\varepsilon }\left( \left|f(X^{k+1}_{\text {feas}}) -F^k_s\right|+\left|f(X^k_{\text {feas}})-F^k_0\right|\right) . \end{aligned} \end{aligned}$$
    (86)
  2. (ii)

    The iteration is h-Dominating (\(\mathbb {1}_{{\mathcal {D}}_h}=1\)). The bound on the change in \(\Phi _k^T\), which can be obtained by replacing \(\mathbb {1}_{J_k}\) by \(\mathbb {1}_{\bar{J_k}}\) in (79), is trivially bounded like

    $$\begin{aligned}&\mathbb {1}_{\{T\le k\}}\mathbb {1}_{I_k}\mathbb {1}_{\bar{J_k}}\mathbb {1}_{{\mathcal {D}}_h}\quad (\Phi _{k+1}^T-\Phi _k^T)\nonumber \\&\quad \le \mathbb {1}_{\{T\le k\}}\mathbb {1}_{I_k}\mathbb {1}_{\bar{J_k}}\mathbb {1}_{{\mathcal {D}}_h}\frac{\nu }{\varepsilon }\left( \left|f(X^{k+1}_{\text {feas}}) -F^k_s\right|+\left|f(X^k_{\text {feas}})-F^k_0\right|\right) . \end{aligned}$$
    (87)
  3. (iii)

    The iteration is Improving (\(\mathbb {1}_{{\mathcal {I}}}=1\)). Again, the change in \(\Phi _k^T\) which can be obtained by replacing \(\mathbb {1}_{J_k}\) by \(\mathbb {1}_{\bar{J_k}}\) in (80), is trivially bounded like

    $$\begin{aligned}&\mathbb {1}_{\{T\le k\}}\mathbb {1}_{I_k}\mathbb {1}_{\bar{J_k}}\mathbb {1}_{{\mathcal {I}}}\quad (\Phi _{k+1}^T-\Phi _k^T)\nonumber \\&\quad \le \mathbb {1}_{\{T\le k\}}\mathbb {1}_{I_k}\mathbb {1}_{\bar{J_k}}\mathbb {1}_{{\mathcal {I}}}\frac{\nu }{\varepsilon }\left( \left|f(X^{k+1}_{\text {feas}}) -F^k_s\right|+\left|f(X^k_{\text {feas}})-F^k_0\right|\right) . \end{aligned}$$
    (88)
  4. (iv)

    The iteration is Unsuccessful (\(\mathbb {1}_{{\mathcal {U}}}=1\)). Because of the decrease of the frame size parameter and hence the decrease in \(\Phi _k^T\), the bound on the change in \(\Phi _k^T\) follows

    $$\begin{aligned} \begin{aligned}&\mathbb {1}_{\{T\le k\}}\mathbb {1}_{I_k}\mathbb {1}_{\bar{J_k}}\mathbb {1}_{{\mathcal {U}}}( \Phi _{k+1}^T-\Phi _k^T)\\&\quad \le \mathbb {1}_{\{T\le k\}}\mathbb {1}_{I_k}\mathbb {1}_{\bar{J_k}}\mathbb {1}_{{\mathcal {U}}}\frac{\nu }{\varepsilon }\left( \left|f(X^{k+1}_{\text {feas}}) -F^k_s\right|+\left|f(X^k_{\text {feas}})-F^k_0\right|\right) . \end{aligned} \end{aligned}$$
    (89)

    Then, combining (86), (87), (88) and \(\mathbb {1}_{I_k}\mathbb {1}_{\bar{J_k}}\le 1\), yields

    $$\begin{aligned}&\mathbb {1}_{\{T\le k\}}\mathbb {1}_{I_k}\mathbb {1}_{\bar{J_k}}(\Phi _{k+1}^T-\Phi _k^T)\nonumber \\&\quad \le \mathbb {1}_{\{T\le k\}}\frac{\nu }{\varepsilon }\left( \left|f(X^{k+1}_{\text {feas}})-F^k_s\right|+\left|f(X^k_{\text {feas}})-F^k_0\right|\right) . \end{aligned}$$
    (90)

    Since Assumption 3 holds, it follows from the conditional Cauchy-Schwarz inequality [20] that

    $$\begin{aligned} {\mathbb {E}}\left( \left|f(X^k_{\text {feas}})-F^k_0\right||{\mathcal {F}}^{C\cdot F}_{k-1}\right)\le & {} {\mathbb {E}}\left( 1|{\mathcal {F}}^{C\cdot F}_{k-1}\right) ^{1/2} \left[ {\mathbb {E}}\left( \left|f(X^k_{\text {feas}}) -F^k_0\right|^2|{\mathcal {F}}^{C\cdot F}_{k-1}\right) \right] ^{1/2}\nonumber \\\le & {} \varepsilon (1-\beta )^{1/2}(\Delta ^k_p)^2, \end{aligned}$$
    (91)

    where (91) follows from (12) and the fact that \({\mathbb {E}}\left( 1|{\mathcal {F}}^{C\cdot F}_{k-1}\right) =1\). Similarly,

    $$\begin{aligned} {\mathbb {E}}\left( \left|f(X^{k+1}_{\text {feas}})-F^k_s\right||{\mathcal {F}}^{C\cdot F}_{k-1}\right) \le \varepsilon (1-\beta )^{1/2}(\Delta ^k_p)^2. \end{aligned}$$
    (92)

Taking expectations with respect to \({\mathcal {F}}^{C\cdot F}_{k-1}\) on both sides of (90) and then using (91), (92) and the \({\mathcal {F}}^{C\cdot F}_{k-1}\)-measurability of the random variables \(\mathbb {1}_{\{T\le k\}}\) and \(\Delta ^k_p\), leads to (83).

Case 3 (Bad bounds, \(\varvec{\mathbb {1}_{\bar{I_k}}=1}\)). The difference in \(\Phi _k^T\) may increase since even though good estimates of f values occur, they might not provide enough decrease to cancel the increase in h whenever Algorithm 2 wrongly accepts an incumbent due to bad bounds. It will be shown that

$$\begin{aligned} \mathbb {1}_{\{T\le k\}}{\mathbb {E}}\left[ \mathbb {1}_{\bar{I_k}}(\Phi _{k+1}^T-\Phi _k^T)|{\mathcal {F}}^{C\cdot F}_{k-1}\right] \le 2\nu \left[ (1-\alpha )^{1/2}+(1-\beta )^{1/2} \right] (\Delta ^k_p)^2\mathbb {1}_{\{T\le k\}}. \nonumber \\ \end{aligned}$$
(93)
  1. (i)

    The iteration is f-Dominating (\(\mathbb {1}_{{\mathcal {D}}_f}=1\)). The change in \(\Phi _k^T\) is bounded, taking into account the possible increase in f. Since the value of h is unchanged, the bound on the change in \(\Phi _k^T\) can be derived from (86) by replacing \(\mathbb {1}_{I_k}\mathbb {1}_{\bar{J_k}}\) by \(\mathbb {1}_{\bar{I_k}}\). That is,

    $$\begin{aligned} \begin{aligned}&\mathbb {1}_{\{T\le k\}}\mathbb {1}_{\bar{I_k}}\mathbb {1}_{{\mathcal {D}}_f}( \Phi _{k+1}^T-\Phi _k^T)\\&\quad \le \mathbb {1}_{\{T\le k\}}\mathbb {1}_{\bar{I_k}}\mathbb {1}_{{\mathcal {D}}_f}\frac{\nu }{\varepsilon }\left( \left|f(X^{k+1}_{\text {feas}})-F^k_s\right|+\left|f(X^k_{\text {feas}})-F^k_0\right|\right) . \end{aligned} \end{aligned}$$
    (94)
  2. (ii)

    The iteration is h-Dominating (\(\mathbb {1}_{{\mathcal {D}}_h}=1\)). Since the value of f is unchanged, the bound on the change in \(\Phi _k^T\) is obtained by multiplying both sides of (67) by \(\mathbb {1}_{\{T\le k\}}\) and replacing \(\Phi _k\) by \(\Phi _k^T\). That is,

    $$\begin{aligned} \mathbb {1}_{\{T\le k\}}\mathbb {1}_{\bar{I_k}}\mathbb {1}_{{\mathcal {D}}_h}(\Phi _{k+1}-\Phi _k)\le \mathbb {1}_{\{T\le k\}}\mathbb {1}_{\bar{I_k}}\mathbb {1}_{{\mathcal {D}}_h}\frac{\nu }{m\varepsilon }\left( \left|h(X^{k+1}_{\text {inf}})-H^k_s\right|+\left|h(X^k_{\text {inf}})-H^k_0\right|\right) .\nonumber \\ \end{aligned}$$
    (95)
  3. (iii)

    The iteration is Improving (\(\mathbb {1}_{{\mathcal {I}}}=1\)). The frame size parameter is updated as in h-Dominating iterations and the value of f is unchanged. Thus, the bound on the change in \(\Phi _k^T\) follows from (95) by replacing \(\mathbb {1}_{{\mathcal {D}}_h}\) by \(\mathbb {1}_{{\mathcal {I}}}\). That is, follows

    $$\begin{aligned} \mathbb {1}_{\{T\le k\}}\mathbb {1}_{\bar{I_k}}\mathbb {1}_{{\mathcal {I}}}(\Phi _{k+1}-\Phi _k)\le \mathbb {1}_{\{T\le k\}}\mathbb {1}_{\bar{I_k}}\mathbb {1}_{{\mathcal {I}}}\frac{\nu }{m\varepsilon }\left( \left|h(X^{k+1}_{\text {inf}})-H^k_s\right|+\left|h(X^k_{\text {inf}})-H^k_0\right|\right) .\nonumber \\ \end{aligned}$$
    (96)
  4. (iv)

    The iteration is Unsuccessful (\(\mathbb {1}_{{\mathcal {U}}}=1\)). Because of the decrease of the frame size parameter and hence the decrease in \(\Phi _k^T\), the bound on the change in \(\Phi _k^T\) is

    $$\begin{aligned} \begin{aligned}&\mathbb {1}_{\{T\le k\}}\mathbb {1}_{\bar{I_k}}\mathbb {1}_{{\mathcal {U}}}( \Phi _{k+1}^T-\Phi _k^T)\\&\quad \le \mathbb {1}_{\{T\le k\}}\mathbb {1}_{\bar{I_k}}\mathbb {1}_{{\mathcal {U}}}\nu \left[ \frac{1}{\varepsilon }\left( \left|f(X^{k+1}_{\text {feas}})-F^k_s\right|+\left|f(X^k_{\text {feas}})-F^k_0\right|\right) \right. \\&\qquad \left. +\frac{1}{m\varepsilon }\left( \left|h(X^{k+1}_{\text {inf}})-H^k_s\right|+\left|h(X^k_{\text {inf}})-H^k_0\right|\right) \right] \end{aligned} \end{aligned}$$
    (97)

Since (97) dominates (94), (95) and (96), combining all four cases leads to

$$\begin{aligned} \begin{aligned}&\mathbb {1}_{\{T\le k\}}\mathbb {1}_{\bar{I_k}}( \Phi _{k+1}^T-\Phi _k^T)\\&\quad \le \mathbb {1}_{\{T\le k\}}\mathbb {1}_{\bar{I_k}}\nu \left[ \frac{1}{\varepsilon }\left( \left|f(X^{k+1}_{\text {feas}})-F^k_s\right|+\left|f(X^k_{\text {feas}})-F^k_0\right|\right) \right. \\&\qquad \left. +\frac{1}{m\varepsilon }\left( \left|h(X^{k+1}_{\text {inf}})-H^k_s\right|+\left|h(X^k_{\text {inf}})-H^k_0\right|\right) \right] \end{aligned} \end{aligned}$$
(98)

Taking expectations with respect to \({\mathcal {F}}^{C\cdot F}_{k-1}\) on both sides of (98) and using (15), (91) and (92) lead to (93). Combining the main results of Case 1, Case 2 and Case 3 of Part 2, specifically (75), (83) and (93),

$$\begin{aligned} \begin{aligned} \mathbb {1}_{\{T\le k\}}{\mathbb {E}}\left[ \Phi _{k+1}^T-\Phi _k^T|{\mathcal {F}}^{C\cdot F}_{k-1}\right]&\le \left[ -\alpha \beta (1-\nu )(1-\tau ^2)+2\nu (1-\alpha )^{1/2} \right. \\&\quad \left. +4\nu (1-\beta )^{1/2}\right] (\Delta ^k_p)^2\mathbb {1}_{\{T\le k\}}. \end{aligned} \end{aligned}$$
(99)

Choosing \(\alpha \) and \(\beta \) according to (20) ensures that

$$\begin{aligned}&-\alpha \beta (1-\nu )(1-\tau ^2)+2\nu (1-\alpha )^{1/2}+4\nu (1-\beta )^{1/2}\nonumber \\&\quad \le -\frac{1}{2}\alpha \beta (1-\nu )(1-\tau ^2), \end{aligned}$$
(100)

and (74) follows from (99) and (100) with the same constant \(\eta =\frac{1}{2}\alpha \beta (1-\nu )(1-\tau ^2)\) as Part 1, which completes the proof. \(\square \)

1.2 Proof of Corollary 4.4

Proof

Only (22) is proved but the proof also applies for \(\left|H^k_s-h(X^k+S^k)\right|\) and \(\left|F^k_s-f(X^k+S^k)\right|\). According to Assumption 3(vi), \({\mathbb {E}}\left( \left|H^k_0-h(X^k)\right| |\ {\mathcal {F}}^{C\cdot F}_{k-1}\right) \le m\varepsilon (1-\alpha )^{1/2}(\Delta ^k_p)^2\), which implies that

$$\begin{aligned} {\mathbb {E}}\left( \left|H^k_0-h(X^k)\right|\right) \le m\varepsilon (1-\alpha )^{1/2}{\mathbb {E}}\left[ (\Delta ^k_p)^2\right] . \end{aligned}$$
(101)

By summing each side of (101) over k from 0 to N, and observing that

it follows from the monotone convergence theorem (see e.g. Theorem 1.6.6 in [32]) that

$$\begin{aligned} {\mathbb {E}}\left( \sum _{k=0}^{+\infty }\left|H^k_0-h(X^k)\right|\right)= & {} {\mathbb {E}}\left( \lim _{N\rightarrow +\infty } S_{N}^h\right) =\lim _{N\rightarrow +\infty }{\mathbb {E}}\left( S_{N}^h\right) =\sum _{k=0}^{+\infty }{\mathbb {E}}\left( \left|H^k_0-h(X^k)\right|\right) \\\le & {} m\varepsilon (1-\alpha )^{1/2}\sum _{k=0}^{+\infty }{\mathbb {E}}\left[ (\Delta ^k_p)^2\right] = m\varepsilon (1-\alpha )^{1/2}\lim _{N\rightarrow +\infty }{\mathbb {E}}\left( S_{N}^\Delta \right) \\= & {} m\varepsilon (1-\alpha )^{1/2}{\mathbb {E}}\left( \lim _{N\rightarrow +\infty } S_{N}^\Delta \right) =m\varepsilon (1-\alpha )^{1/2}{\mathbb {E}}\left[ \sum _{k=0}^{+\infty }(\Delta ^k_p)^2\right] \\\le & {} \mu \times m\varepsilon (1-\alpha )^{1/2}<+\infty , \end{aligned}$$

where \(\mu \) is from (54). This means that \(\displaystyle {\sum _{k=0}^{+\infty }\left|H^k_0-h(X^k)\right|<+\infty }\) almost surely, which implies the first result of (22). The proof for \(\left|F^k_0-f(X^k)\right|\) is similar by observing that (see (91))

$$\begin{aligned} {\mathbb {E}}\left( \left|F^k_0-f(X^k)\right||{\mathcal {F}}^{C\cdot F}_{k-1}\right) \le \varepsilon (1-\beta )^{1/2}(\Delta ^k_p)^2. \end{aligned}$$

\(\square \)

1.3 Proof of Lemma 4.7

Proof

The proof uses ideas from [11, 23]. The result is proved by contradiction conditioned on the almost sure event \(E_1=\{\Delta ^k_p\rightarrow 0\}\). All that follows is conditioned on the event \(E_1\). Assume that with nonzero probability, there exists a random variable \({\mathcal {E}}'>0\) such that

$$\begin{aligned} \Psi _k^h\ge {\mathcal {E}}', \quad \text {for all}\ k\in {\mathbb {N}}, \end{aligned}$$
(102)

that is,

$$\begin{aligned} {\mathbb {P}}\left( \left\{ \omega \in \Omega :\exists {\mathcal {E}}'(\omega )>0 \text{ such } \text{ that } \forall k\in {\mathbb {N}}, \Psi _k^h(\omega )\ge {\mathcal {E}}'(\omega )\right\} \right) >0. \end{aligned}$$
(103)

Let \(\{x^k_{\text {inf}}\}_{k\in {\mathbb {N}}}\), \(\{s^k\}_{k\in {\mathbb {N}}}\), \(\{\delta ^k_p\}_{k\in {\mathbb {N}}}\) and \(\epsilon '>0\) be realizations of \(\{X^k_{\text {inf}}\}_{k\in {\mathbb {N}}}\), \(\{S^k\}_{k\in {\mathbb {N}}}\), \(\{\Delta ^k_p\}_{k\in {\mathbb {N}}}\) and \({\mathcal {E}}'\), respectively for which \(\psi _k^h\ge \epsilon '\), for all \(k\in {\mathbb {N}}\). Let \({\hat{z}}\) be the parameter of Algorithm 2 satisfying \(\delta ^k_p\le \tau ^{-{\hat{z}}}\) for all \(k\ge 0\). Since \(\delta ^k_p\rightarrow 0\) due to the conditioning on \(E_1\), there exists \(k_0\in {\mathbb {N}}\) such that

$$\begin{aligned} \delta ^k_p<\lambda :=\min \left\{ \frac{\epsilon '}{m\varepsilon (\gamma +2)},\tau ^{1-{\hat{z}}} \right\} , \quad \text {for all}\ k\ge k_0. \end{aligned}$$
(104)

Consequently and since \(\tau <1\), the random variable \(R_k\) with realizations \(r_k:=-\log _{\tau }\left( \frac{\delta ^k_p}{\lambda }\right) \) satisfies \(r_k<0\) for all \(k\ge k_0\). The main idea of the proof is to show that such realizations occur only with probability zero, thus leading to a contradiction. First \(\{R_k\}_{k\in {\mathbb {N}}}\) is shown to be a submartingale. Let \(k\ge k_0\) be an iteration for which the events \(I_k\) and \(J_k\) both occur, which happens with probability at least \(\alpha \beta >1/2\). Then, it follows from the definition of the event \(I_k\) (see Definition 3.3) that

$$\begin{aligned} h(x^k_{\text {inf}})\le & {} u^k_0(x^k_{\text {inf}}) \le \sum _{j=1}^{m}\max \left\{ c^k_{j,0}(x^k_{\text {inf}}),0 \right\} +m\varepsilon (\delta ^k_p)^2\nonumber \\= & {} h^k_0(x^k_{\text {inf}})+ m\varepsilon (\delta ^k_p)^2, \end{aligned}$$
(105)
$$\begin{aligned} \text {and}\quad h(x^k_{\text {inf}}+s^k)\ge & {} \ell ^k_s(x^k_{\text {inf}}+s^k) \ge h^k_s(x^k_{\text {inf}}+s^k)- m\varepsilon (\delta ^k_p)^2. \end{aligned}$$
(106)
$$\begin{aligned} \begin{aligned} \text {Hence,}\quad h^k_s(x^k_{\text {inf}}+s^k)-h^k_0(x^k_{\text {inf}})&=[h(x^k_{\text {inf}}+s^k)-h(x^k_{\text {inf}})]+[h(x^k_{\text {inf}})-h^k_0(x^k_{\text {inf}})]\\&\quad +[h^k_s(x^k_{\text {inf}}+s^k)-h(x^k_{\text {inf}}+s^k)] \\&\le 2m\varepsilon (\delta ^k_p)^2-\epsilon '\delta ^k_p\le 2m\varepsilon (\delta ^k_p)^2-m\varepsilon (\gamma +2)(\delta ^k_p)^2\\&=-\gamma m\varepsilon (\delta ^k_p)^2 \end{aligned}\nonumber \\ \end{aligned}$$
(107)

where the first inequality in (107) follows from (102), (105) and (106) while the last inequality follows from (104). Consequently, iteration k of Algorithm 2 cannot be Unsuccessful. Thus, the frame size parameter is updated according to \(\delta _p^{k+1}=\tau ^{-1}\delta ^k_p\) since \(\delta ^k_p<\tau ^{1-{\hat{z}}}\). Hence, \(r_{k+1}=r_k+1\).

Let \({\mathcal {F}}^{I\cdot J}_{k-1}=\sigma (I_0,I_1,\dots ,I_{k-1})\cap \sigma (J_0,J_1,\dots ,J_{k-1})\). For all other outcomes of \(I_k\) and \(J_k\), which will occur with a total probability of at most \(1-\alpha \beta \), the inequality \(\delta _p^{k+1}\ge \tau \delta ^k_p\) always holds, thus implying that \(r_{k+1}\ge r_k-1\). Hence,

$$\begin{aligned} {\mathbb {E}}\left( \mathbb {1}_{I_k\cap J_k}(R_{k+1}-R_k)|{\mathcal {F}}^{I\cdot J}_{k-1}\right)= & {} {\mathbb {P}}\left( I_k\cap J_k|{\mathcal {F}}^{I\cdot J}_{k-1}\right) \ge \alpha \beta \\ \text {and}\quad {\mathbb {E}}\left( \mathbb {1}_{\overline{I_k\cap J_k}}(R_{k+1}-R_k)|{\mathcal {F}}^{I\cdot J}_{k-1}\right)\ge & {} -{\mathbb {P}}\left( \overline{I_k\cap J_k}|{\mathcal {F}}^{I\cdot J}_{k-1}\right) \ge \alpha \beta -1. \end{aligned}$$

Thus, \({\mathbb {E}}\left( R_{k+1}-R_k|{\mathcal {F}}^{I\cdot J}_{k-1}\right) \ge 2\alpha \beta -1>0\), implying that \(\{R_k\}\) is a submartingale. The remainder of the proof is almost identical to that of the proof of the \(\liminf \)-type first-order result in [23].

Next is constructed a random walk \(W_k\) with realizations \(w_k\) on the same probability space as \(R_k\), which will serve as a lower bound on \(R_k\). Define \(W_k\) as in (14) by

$$\begin{aligned} W_k= \sum _{i=0}^{k}(2\cdot \mathbb {1}_{I_i}\mathbb {1}_{J_i}-1), \end{aligned}$$
(108)

where the indicator random variables \(\mathbb {1}_{I_i}\) and \(\mathbb {1}_{J_i}\) are such that \(\mathbb {1}_{I_i}=1\) if \(I_i\) occurs, \(\mathbb {1}_{I_i}=0\) otherwise, and similarly, \(\mathbb {1}_{J_i}=1\) if \(J_i\) occurs while \(\mathbb {1}_{J_i}=0\) otherwise. Then following the proof of Theorem 3.6, observe that \(\{W_k\}\) is a \({\mathcal {F}}^{I\cdot J}_{k-1}\)-submartingale with bounded (nonzero) increments (and, as such, cannot converge to any finite value; see also [23] for the same result), thus leading to the conclusion that the event \(\left\{ \underset{k\rightarrow +\infty }{\limsup }\ W_k=+\infty \right\} \) occurs almost surely. Since by construction

$$\begin{aligned} r_k-r_{k_0} = -{\log }_\tau \left( \frac{\delta ^k_p}{\delta ^{k_0}_p}\right) = k-k_0 \ge w_k-w_{k_0}, \end{aligned}$$

then with probability one, \(R_k\) is positive infinitely often. Thus, the sequence of realizations \(r_k\) such that \(r_k<0\) for all \(k\ge k_0\) occurs with probability zero. Thus, the assumption that (103) holds is false. This implies that

$$\begin{aligned} {\mathbb {P}}\left( \left\{ \omega \in \Omega :\forall {\mathcal {E}}'(\omega )>0, \exists k\in {\mathbb {N}} \text{ such } \text{ that } \Psi _k^h(\omega )< {\mathcal {E}}'(\omega )\right\} \right) =1, \end{aligned}$$

which means that (23) holds. \(\square \)

1.4 Proof of Theorem 4.10

Proof

The theorem is proved using ideas from [9, 11]. Define the events \(E_1\) and \(E_2\) by

$$\begin{aligned} E_1= & {} \left\{ \omega \in \Omega :\Delta ^k_p(\omega )\rightarrow 0 \right\} \quad \text {and}\\ E_2= & {} \left\{ \omega \in \Omega :\exists K'(\omega )\subset {\mathbb {N}}\ \text {such that}\ {\lim }_{K'(\omega )}\Psi _k^h(\omega )\le 0\right\} . \end{aligned}$$

Then \(E_1\) and \(E_2\) are almost sure due to Corollary 4.3 and (23) respectively. Let \(\omega \in E_1\cap E_2\) be an arbitrary outcome and note that the event \(E_1\cap E_2\) is also almost sure as a countable intersection of almost sure events. Then \(\lim _{K'(\omega )}\Delta ^k_p(\omega )=0\). It follows from the compactness hypothesis of Assumption 2 that there exists \(K(\omega )\subseteq K'(\omega )\) for which the subsequence \(\{X^k_{\text {inf}}(\omega )\}_{k\in K(\omega )}\) converges to a limit \({\hat{X}}_{\inf }(\omega )\). Specifically, \({\hat{X}}_{\inf }(\omega )\) is a refined point for the refining subsequence \(\{X^k_{\text {inf}}(\omega )\}_{k\in K(\omega )}\). Let \(v\in T^H_{{\mathcal {X}}}({\hat{X}}_{\inf }(\omega ))\) be a refining direction for \({\hat{X}}_{\inf }(\omega )\). Denote by V the random vector with realizations v, i.e., \(v=V(\omega )\), and let \({\hat{x}}_{\inf }={\hat{X}}_{\inf }(\omega )\), \(x^k_{\text {inf}}=X^k_{\text {inf}}(\omega )\), \(\delta ^k_p=\Delta ^k_p(\omega )\), \(\delta ^k_m=\Delta ^k_m(\omega )\), \(\psi _k^h=\Psi _k^h(\omega )\) and \({\mathcal {K}}=K(\omega )\). Since v is a refining direction, there exists \({\mathcal {L}}\subseteq {\mathcal {K}}\) and polling directions \(d^k\in {\mathbb {D}}^k_p(x^k_{\text {inf}})\) such that \(v=\underset{k\in {\mathcal {L}}}{\lim }\frac{d^k}{{\left\Vert d^k\right\Vert }_{\infty }}\). For each \(k\in {\mathcal {L}}\), define

$$\begin{aligned} \begin{aligned} t_k&=\delta ^k_m{\left\Vert d^k\right\Vert }_{\infty }\rightarrow 0,\quad y^k=x^k_{\text {inf}}+t_k\left( \frac{d^k}{{\left\Vert d^k\right\Vert }_{\infty }}-v\right) \rightarrow {\hat{x}}_{\inf },\\ a_k&=\frac{h(y^k+t_k v)-h(x^k_{\text {inf}})}{t_k} \quad \text {and}\quad b_k= \frac{h(x^k_{\text {inf}})-h(y^k)}{t_k}, \end{aligned} \end{aligned}$$

where the fact that \(t_k\rightarrow 0\) follows from Definition 2.11, specifically the inequality \(\delta ^k_m{\left\Vert d^k\right\Vert }_{\infty }\le \delta ^k_pb\). Since h is \(\lambda ^h\)–locally Lipschitz,

$$\begin{aligned}&\left|a_k\right|\le \frac{\lambda ^h}{t_k}{\left\Vert (y^k+t_k v)-x^k_{\text {inf}}\right\Vert }_{\infty }=\lambda ^h \quad \text {and}\\&\left|b_k\right|\le \frac{\lambda ^h}{t_k}{\left\Vert x^k_{\text {inf}}-y^k\right\Vert }_{\infty } =\lambda ^h{\left\Vert \frac{d^k}{{\left\Vert d^k\right\Vert }_{\infty }}-v\right\Vert }_{\infty }\rightarrow 0, \end{aligned}$$

which shows that Lemma 4.9 applies to both subsequences \(\{a_k\}_{k\in {\mathcal {L}}}\) and \(\{b_k\}_{k\in {\mathcal {L}}}\). Moreover, combining the inequality \(\lim _{{\mathcal {L}}}\psi _k^h\le 0\) and Assumption 6 (the fact that \(\delta ^k_p{\left\Vert d^k\right\Vert }_{\infty }\ge d_{\min }>0\)), yields

$$\begin{aligned} \lim _{k\in {\mathcal {L}}}\left( \frac{-\psi _k^h}{\delta ^k_p{\left\Vert d^k\right\Vert }_{\infty }}\right)= & {} {\lim _{k\in {\mathcal {L}}} \frac{h(x^k_{\text {inf}}+\delta ^k_md^k)-h(x^k_{\text {inf}})}{(\delta ^k_p)^2{\left\Vert d^k\right\Vert }_{\infty }}} \nonumber \\= & {} \lim _{k\in {\mathcal {L}}} \frac{h(x^k_{\text {inf}}+\delta ^k_md^k)-h(x^k_{\text {inf}})}{t_k}\ge -d_{\min }^{-1}\lim _{k\in {\mathcal {L}}}\psi _k^h \ge 0,\nonumber \\ \end{aligned}$$
(109)

where the equality in (109) follows from \(\delta ^k_m=(\delta ^k_p)^2\) for sufficiently large k. Thus, by adding and subtracting \(h(x^k_{\text {inf}})\) to the numerator of the definition of the Clarke derivative, and using the fact that \(x^k_{\text {inf}}+\delta ^k_md^k\in {\mathcal {X}}\) for sufficiently large \(k\in {\mathcal {L}}\) since v is a hypertangent direction,

$$\begin{aligned} h^{\circ }({\hat{x}}_{\inf };v)\ge & {} \limsup _{k\in {\mathcal {L}}}\frac{h(y^k+t_k v)-h(x^k_{\text {inf}})+h(x^k_{\text {inf}})-h(y^k)}{t_k}=\limsup _{k\in {\mathcal {L}}}(a_k+b_k)\\= & {} \limsup _{k\in {\mathcal {L}}}a_k+ \lim _{k\in {\mathcal {L}}}b_k= \limsup _{k\in {\mathcal {L}}} \frac{h(x^k_{\text {inf}}+\delta ^k_md^k)-h(x^k_{\text {inf}})}{t_k}\ge 0, \end{aligned}$$

where the last inequality follows from (109). Every outcome \(\omega \) arbitrarily chosen in \(E_1\cap E_2\) therefore belongs to the event

$$\begin{aligned} \begin{aligned} E_3&:= \left\{ \omega \in \Omega :\exists K(\omega )\subseteq {\mathbb {N}}\ \text {and}\ \exists {\hat{X}}_{\inf }(\omega )\right. = \lim _{k\in K(\omega )}X^k_{\text {inf}}(\omega ), {\hat{X}}_{\inf }(\omega )\in {\mathcal {X}},\ \text {such that} \\&\qquad \forall V(\omega )\in \left. T^H_{{\mathcal {X}}}({\hat{X}}_{\inf }(\omega )),\ h^{\circ }({\hat{X}}_{\inf }(\omega );V(\omega ))\ge 0\right\} , \end{aligned} \end{aligned}$$

thus implying that \(E_1\cap E_2\subseteq E_3\). Then the proof is complete since \({\mathbb {P}}\left( E_1\cap E_2\right) =1\).

\(\square \)

1.5 Proof of Corollary 4.11

Proof

The proof is almost identical to the proof of a similar result (Corollary 3.6) in [9]. Recall the sequence \(K'\) of random variables and the almost sure event \(E_1\cap E_2\) in the proof of Theorem 4.10 and let \(\omega \in E_1\cap E_2\). Following the latter proof, there exists \(K(\omega )\subseteq K'(\omega )\) such that \(\lim _{K(\omega )}X^k_{\text {inf}}(\omega )={\hat{X}}_{\inf }(\omega )={\hat{x}}_{\inf }\). Moreover, it follows from Theorem 4.10 that \(h^\circ ({\hat{x}}_{\inf }; v)=h^\circ ({\hat{X}}_{\inf }(\omega ); V(\omega ))\ge 0\) for a set of refining directions v which is dense in the closure \(\ {\text {cl}}\left( T^H_{{\mathcal {X}}}({\hat{x}}_{\inf })\right) \) of \(T^H_{{\mathcal {X}}}({\hat{x}}_{\inf })\). Then the proof is complete by noticing that \(\ {\text {cl}}\left( T^H_{{\mathcal {X}}}({\hat{x}}_{\inf })\right) =T^{Cl}_{{\mathcal {X}}}({\hat{x}}_{\inf })\) wherever \(T^H_{{\mathcal {X}}}({\hat{x}}_{\inf })\ne \emptyset \) [50], with \(T^{Cl}_{{\mathcal {X}}}({\hat{x}}_{\inf })\) denoting the Clarke tangent cone to \({\mathcal {X}}\) at \({\hat{x}}_{\inf }\). \(\square \)

1.6 Proof of Lemma 4.12

Proof

The proof is almost identical to those of Lemma 4.7 and a similar result in [11]. Hence, full details are not provided here again. Unless otherwise stated, all the sequences, events and constants considered are defined as in the proof of Lemma 4.7. The result is proved by contradiction and all that follows is conditioned on the almost sure event \(E_1\cap \{T<+\infty \}\). Assume that with nonzero probability there exists a random variable \({\mathcal {E}}''>0\) such that

$$\begin{aligned} \Psi _k^{f,T}\ge {\mathcal {E}}'', \quad \text {for all}\ k\ge 0. \end{aligned}$$
(110)

Let \({t,}\ \{x^{k\vee t}_{\text {feas}}\}_{k\in {\mathbb {N}}}\), \(\{s^k\}_{k\in {\mathbb {N}}}\), \(\{\delta ^k_p\}_{k\in {\mathbb {N}}}\) and \(\epsilon ''>0\) be realizations of \({T,}\ \{X^{k\vee T}_{\text {feas}}\}_{k\in {\mathbb {N}}}\), \(\{S^k\}_{k\in {\mathbb {N}}}\), \(\{\Delta ^k_p\}_{k\in {\mathbb {N}}}\) and \({\mathcal {E}}''\), respectively for which \(\psi _k^{f,t}\ge \epsilon ''\) for all \(k\ge 0\). Let \({\bar{k}}_0\in {\mathbb {N}}^*\) be such that

$$\begin{aligned} \delta ^k_p<\lambda :=\min \left\{ \frac{\epsilon ''}{\varepsilon (\gamma +2)},\tau ^{1-{\hat{z}}} \right\} \quad \text {for all}\ k\ge {\bar{k}}_0. \end{aligned}$$
(111)

The key element of the proof is to show that an iteration \(k\ge k_0:=\max \{{\bar{k}}_0,t\}\) for which the events \(I_k\) and \(J_k\) both occur cannot be Unsuccessful, and hence \(\{R_k\}\) is a submartingale.

It follows from (110) and (111) that

$$\begin{aligned}&f(x^k_{\text {feas}}+s^k)-f(x^k_{\text {feas}})\le -\epsilon ''\delta ^k_p\le -(\gamma +2)\varepsilon (\delta ^k_p)^2, \quad \text {for all}\ k\ge k_0. \\&\text {Since }J_k\text { occurs,}\\&f^k_s(x^k_{\text {feas}}+s^k)-f^k_0(x^k_{\text {feas}})\\&\quad =[f(x^k_{\text {feas}}+s^k)-f(x^k_{\text {feas}})]+[f(x^k_{\text {feas}})-f^k_0(x^k_{\text {feas}})] \\&\qquad +[f^k_s(x^k_{\text {feas}}+s^k)-f(x^k_{\text {feas}}+s^k)]\\&\quad \le -(\gamma +2)\varepsilon (\delta ^k_p)^2+2\varepsilon (\delta ^k_p)^2=-\gamma \varepsilon (\delta ^k_p)^2, \end{aligned}$$

which implies that the iteration \(k\ge k_0\) of Algorithm 2 cannot be Unsuccessful. The rest of the proof follows that of Lemma 4.7. \(\square \)

1.7 Proof of Theorem 4.13

Proof

The proof follows from Corollary 4.4 and the assumption \(\lim _{k\in K}H^k_0(X^{k\vee T}_{\text {feas}})= 0\) almost surely, by observing that for any outcome \(\omega \) in the almost sure event

$$\begin{aligned} \begin{aligned} E_4&:=\left\{ \omega \in \Omega :\forall K(\omega )\subseteq {\mathbb {N}},\lim _{k\in K(\omega )}\left|H^k_0(X^{k\vee T}_{\text {feas}})(\omega )\right. \right. \left. -h(X^{k\vee T}_{\text {feas}}(\omega ))\right|=0\ {\text {and}}\\&\quad \left. {\lim _{k\in K(\omega )}H^k_0(X^{k\vee T}_{\text {feas}})(\omega ) = 0} \right\} \cap \{T<+\infty \} , \end{aligned} \end{aligned}$$

the inequalities

$$\begin{aligned} h(X^{k\vee T}_{\text {feas}}(\omega ))-\left|H^k_0(X^{k\vee T}_{\text {feas}})(\omega )\right|\le & {} \left|h(X^{k\vee T}_{\text {feas}}(\omega ))-\left|H^k_0(X^{k\vee T}_{\text {feas}})(\omega )\right|\right|\\\le & {} \left|h(X^{k\vee T}_{\text {feas}}(\omega ))-H^k_0(X^{k\vee T}_{\text {feas}})(\omega )\right| \end{aligned}$$

and the continuity of \(y\mapsto \left|y\right|\), yield

$$\begin{aligned} \lim _{k\in K(\omega )}h(X^{k\vee T}_{\text {feas}}(\omega ))\le & {} \lim _{k\in K(\omega )} \left( \left|h(X^{k\vee T}_{\text {feas}}(\omega ))-H^k_0(X^{k\vee T}_{\text {feas}})(\omega )\right| +\left|H^k_0(X^{k\vee T}_{\text {feas}})(\omega )\right|\right) \\= & {} \lim _{k\in K(\omega )}\left|h(X^{k\vee T}_{\text {feas}}(\omega ))-H^k_0(X^{k\vee T}_{\text {feas}})(\omega ) \right|+\left|\lim _{k\in K(\omega )}H^k_0(X^{k\vee T}_{\text {feas}})(\omega )\right|=0. \end{aligned}$$

This means that

$$\begin{aligned} h({\hat{X}}_{\text {feas}}(\omega ))=\lim _{k\in K(\omega )}h(X^{k\vee T}_{\text {feas}}(\omega ))=0 \end{aligned}$$
(112)

since h is nonnegative, where the first equality in (112) follows from the continuity of h in \({\mathcal {X}}\). Consequently,

$$\begin{aligned} {\mathbb {P}}\left( h({\hat{X}}_{\text {feas}})=0\right) ={\mathbb {P}}\left( {\hat{X}}_{\text {feas}}\in {\mathcal {D}}\right) =1. \end{aligned}$$

\(\square \)

1.8 Proof of Theorem 4.14

Proof

First, \({\mathbb {P}}\left( {\hat{X}}_{\text {feas}}\in {\mathcal {D}}\right) =1\) follows from Theorem 4.13. The proof follows from that of Theorem 4.10, by replacing h by f, \({\hat{x}}_{\inf }={\hat{X}}_{\inf }(\omega )\) by \({\hat{x}}_{\text {feas}}={\hat{X}}_{\text {feas}}(\omega )\), \(x^k_{\text {inf}}=X^k_{\text {inf}}(\omega )\) by \(x^{k\vee t}_{\text {feas}}=X^{k\vee T}_{\text {feas}}(\omega )\), \(\psi _k^h=\Psi _k^h(\omega )\) by \(\psi _k^{f,t}=\Psi _k^{f,T}(\omega )\) with \(t=T(\omega )\) and \(T^H_{{\mathcal {X}}}(\cdot )\) by \(T^H_{{\mathcal {D}}}(\cdot )\), for \(\omega \) fixed and arbitrarily chosen in the almost sure event \(E_1\cap E_5\cap \{T<+\infty \}\), where

$$\begin{aligned} \begin{aligned} E_5&= \left\{ \omega \in \Omega :\exists K(\omega )\subseteq {\mathbb {N}}\ \text {such that}\ {\hat{X}}_{\text {feas}}(\omega )\right. = \lim _{k\in K(\omega )}X^{k\vee T}_{\text {feas}}(\omega ),\ {\hat{X}}_{\text {feas}}(\omega )\in {\mathcal {D}}, \\&\quad \lim _{k\in K(\omega )}\Psi _k^{f,T}(\omega )\le 0 \left. \ \text {and}\ \lim _{k\in K(\omega )}H^k_0(X^{k\vee T}_{\text {feas}})(\omega ) = 0 \right\} . \end{aligned} \end{aligned}$$

\(\square \)

1.9 Proof of Corollary 4.15

Proof

The proof is almost identical to the proof of a similar result (Corollary 3.4) in [9]. Let \(\omega \) be arbitrarily chosen in the almost sure event \(E_1\cap E_5\cap \{T<+\infty \}\). It follows from Theorem 4.14 that \(f^\circ ({\hat{x}}_{\text {feas}}; v)=f^\circ ({\hat{X}}_{\text {feas}}(\omega ); V(\omega ))\ge 0\) for a set of refining directions v which is dense in the closure of \(T^H_{{\mathcal {D}}}({\hat{x}}_{\text {feas}})\). Then the proof is complete by noticing that the closure of the hypertangent cone coincides with the Clarke tangent cone wherever the hypertangent cone is nonempty [9, 50]. \(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Dzahini, K.J., Kokkolaras, M. & Le Digabel, S. Constrained stochastic blackbox optimization using a progressive barrier and probabilistic estimates. Math. Program. 198, 675–732 (2023). https://doi.org/10.1007/s10107-022-01787-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10107-022-01787-7

Mathematics Subject Classification

Navigation