Skip to main content
Log in

Probability maximization via Minkowski functionals: convex representations and tractable resolution

  • Full Length Paper
  • Series A
  • Published:
Mathematical Programming Submit manuscript

Abstract

In this paper, we consider the maximizing of the probability \({\mathbb {P}}\left\{ \, \zeta \, \mid \, \zeta \, \in \, {\mathbf {K}}({\mathbf {x}}) \, \right\} \) over a closed and convex set \({\mathcal {X}}\), a special case of the chance-constrained optimization problem. Suppose \({\mathbf {K}}({\mathbf {x}}) \, \triangleq \, \left\{ \, \zeta \, \in \, {\mathcal {K}}\, \mid \, c({\mathbf {x}},\zeta ) \, \ge \, 0 \right\} \), and \(\zeta \) is uniformly distributed on a convex and compact set \({\mathcal {K}}\) and \(c({\mathbf {x}},\zeta )\) is defined as either \(c({\mathbf {x}},\zeta )\, \triangleq \, 1-\left| \zeta ^T{\mathbf {x}}\right| ^m\) where \(m\ge 0\) (Setting A) or \(c({\mathbf {x}},\zeta ) \, \triangleq \, T{\mathbf {x}}\, - \, \zeta \) (Setting B). We show that in either setting, by leveraging recent findings in the context of non-Gaussian integrals of positively homogenous functions, \({\mathbb {P}}\left\{ \,\zeta \, \mid \, \zeta \, \in \, {\mathbf {K}}({\mathbf {x}}) \, \right\} \) can be expressed as the expectation of a suitably defined continuous function \(F(\bullet ,\xi )\) with respect to an appropriately defined Gaussian density (or its variant), i.e. \({\mathbb {E}}_{{{\tilde{p}}}} \left[ \, F({\mathbf {x}},\xi )\, \right] \). Aided by a recent observation in convex analysis, we then develop a convex representation of the original problem requiring the minimization of \(g\left( {\mathbb {E}}\left[ \, F(\bullet ,\xi )\, \right] \right) \) over \({\mathcal {X}}\), where g is an appropriately defined smooth convex function. Traditional stochastic approximation schemes cannot contend with the minimization of \(g\left( {\mathbb {E}}\left[ F(\bullet ,\xi )\right] \right) \) over \(\mathcal X\), since conditionally unbiased sampled gradients are unavailable. We then develop a regularized variance-reduced stochastic approximation (r-VRSA) scheme that obviates the need for such unbiasedness by combining iterative regularization with variance-reduction. Notably, (r-VRSA) is characterized by almost-sure convergence guarantees, a convergence rate of \(\mathcal {O}(1/k^{1/2-a})\) in expected sub-optimality where \(a > 0\), and a sample complexity of \(\mathcal {O}(1/\epsilon ^{6+\delta })\) where \(\delta > 0\). To the best of our knowledge, this may be the first such scheme for probability maximization problems with convergence and rate guarantees. Preliminary numerics on a portfolio selection problem (Setting A) and a set-covering problem (Setting B) suggest that the scheme competes well with naive mini-batch SA schemes as well as integer programming approximation methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

References

  1. van Ackooij, W.: Eventual convexity of chance constrained feasible sets. Optimization 64(5), 1263–1284 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  2. van Ackooij, W.: A discussion of probability functions and constraints from a variational perspective. Set-Valued Var. Anal. 28(4), 585–609 (2020). https://doi.org/10.1007/s11228-020-00552-2

    Article  MathSciNet  MATH  Google Scholar 

  3. van Ackooij, W., Aleksovska, I., Munoz-Zuniga, M.: (Sub-)differentiability of probability functions with elliptical distributions. Set-Valued Var. Anal. 26(4), 887–910 (2018). https://doi.org/10.1007/s11228-017-0454-3

    Article  MathSciNet  MATH  Google Scholar 

  4. van Ackooij, W., Berge, V., de Oliveira, W., Sagastizábal, C.: Probabilistic optimization via approximate \(p\)-efficient points and bundle methods. Comput. Oper. Res. 77, 177–193 (2017). https://doi.org/10.1016/j.cor.2016.08.002

    Article  MathSciNet  MATH  Google Scholar 

  5. van Ackooij, W., Demassey, S., Javal, P., Morais, H., de Oliveira, W., Swaminathan, B.: A bundle method for nonsmooth DC programming with application to chance-constrained problems. Comput. Optim. Appl. 78(2), 451–490 (2021). https://doi.org/10.1007/s10589-020-00241-8

    Article  MathSciNet  MATH  Google Scholar 

  6. van Ackooij, W., Henrion, R.: Gradient formulae for nonlinear probabilistic constraints with Gaussian and Gaussian-like distributions. SIAM J. Optim. 24(4), 1864–1889 (2014). https://doi.org/10.1137/130922689

    Article  MathSciNet  MATH  Google Scholar 

  7. van Ackooij, W., Henrion, R.: (Sub-)gradient formulae for probability functions of random inequality systems under Gaussian distribution. SIAM/ASA J. Uncertain. Quantif. 5(1), 63–87 (2017). https://doi.org/10.1137/16M1061308

    Article  MathSciNet  MATH  Google Scholar 

  8. van Ackooij, W., Henrion, R., Möller, A., Zorgati, R.: Joint chance constrained programming for hydro reservoir management. Optim. Eng. 15(2), 509–531 (2014). https://doi.org/10.1007/s11081-013-9236-4

    Article  MathSciNet  MATH  Google Scholar 

  9. van Ackooij, W., Pérez-Aros, P.: Gradient formulae for nonlinear probabilistic constraints with non-convex quadratic forms. J. Optim. Theory Appl. 185(1), 239–269 (2020). https://doi.org/10.1007/s10957-020-01634-9

    Article  MathSciNet  MATH  Google Scholar 

  10. van Ackooij, W., Sagastizábal, C.: Constrained bundle methods for upper inexact oracles with application to joint chance constrained energy problems. SIAM J. Optim. 24(2), 733–765 (2014). https://doi.org/10.1137/120903099

    Article  MathSciNet  MATH  Google Scholar 

  11. Ahmed, S., Luedtke, J., Song, Y., Xie, W.: Nonanticipative duality, relaxations, and formulations for chance-constrained stochastic programs. Math. Program. 162(1–2, Ser. A), 51–81 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  12. Balasubramanian, K., Ghadimi, S., Nguyen, A.: Stochastic multi-level composition optimization algorithms with level-independent convergence rates. arXiv preprint arXiv:2008.10526 (2020)

  13. Bardakci, I., Lagoa, C.M.: Distributionally robust portfolio optimization. In: 2019 IEEE 58th Conference on Decision and Control (CDC), pp. 1526–1531. IEEE (2019)

  14. Bardakci, I.E., Lagoa, C., Shanbhag, U.V.: Probability maximization with random linear inequalities: Alternative formulations and stochastic approximation schemes. In: 2018 Annual American Control Conference, ACC 2018, Milwaukee, WI, USA, June 27-29, 2018, pp. 1396–1401. IEEE (2018)

  15. Bienstock, D., Chertkov, M., Harnett, S.: Chance-constrained optimal power flow: Risk-aware network control under uncertainty. SIAM Rev. 56(3), 461–495 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  16. Bobkov, S.G.: Convex bodies and norms associated to convex measures. Probab. Theory Relat. Fields 147(1–2), 303–332 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  17. Brascamp, H.J., Lieb, E.H.: On extensions of the Brunn-Minkowski and Prékopa-Leindler theorems, including inequalities for log concave functions, and with an application to the diffusion equation. J. Functional Analysis 22(4), 366–389 (1976). https://doi.org/10.1016/0022-1236(76)90004-5

    Article  MathSciNet  MATH  Google Scholar 

  18. Burke, J.V., Chen, X., Sun, H.: The subdifferential of measurable composite max integrands and smoothing approximation. Math. Program. 181(2, Ser. B), 229–264 (2020). https://doi.org/10.1007/s10107-019-01441-9

    Article  MathSciNet  MATH  Google Scholar 

  19. Byrd, R.H., Chin, G.M., Nocedal, J., Wu, Y.: Sample size selection in optimization methods for machine learning. Math. Program. 134(1), 127–155 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  20. Campi, M.C., Garatti, S.: A sampling-and-discarding approach to chance-constrained optimization: feasibility and optimality. J. Optim. Theory Appl. 148(2), 257–280 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  21. Charnes, A., Cooper, W.W.: Chance-constrained programming. Management Sci. 6, 73–79 (1959/1960)

  22. Charnes, A., Cooper, W.W., Symonds, G.H.: Cost horizons and certainty equivalents: An approach to stochastic programming of heating oil. Management Science 4(3), 235–263 (1958). https://EconPapers.repec.org/RePEc:inm:ormnsc:v:4:y:1958:i:3:p:235-263

  23. Chen, L.: An approximation-based approach for chance-constrained vehicle routing and air traffic control problems. In: Large scale optimization in supply chains and smart manufacturing, Springer Optim. Appl., vol. 149, pp. 183–239. Springer, Cham (2019)

  24. Chen, T., Sun, Y., Yin, W.: Solving stochastic compositional optimization is nearly as easy as solving stochastic optimization. IEEE Trans. Signal Process. 69, 4937–4948 (2021)

    Article  MathSciNet  MATH  Google Scholar 

  25. Chen, W., Sim, M., Sun, J., Teo, C.P.: From cvar to uncertainty set: Implications in joint chance-constrained optimization. Oper. Res. 58(2), 470–485 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  26. Cheng, J., Chen, R.L.Y., Najm, H.N., Pinar, A., Safta, C., Watson, J.P.: Chance-constrained economic dispatch with renewable energy and storage. Comput. Optim. Appl. 70(2), 479–502 (2018). https://doi.org/10.1007/s10589-018-0006-2

    Article  MathSciNet  MATH  Google Scholar 

  27. Clarke, F.H.: Optimization and nonsmooth analysis, Classics in Applied Mathematics, vol. 5, second edn. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA (1990). https://doi.org/10.1137/1.9781611971309

  28. Cui, Y., Liu, J., Pang, J.S.: Nonconvex and nonsmooth approaches for affine chance constrained stochastic programs. Set-Valued Variat. Anal. 30, 1149–1211 (2022)

    Article  MathSciNet  MATH  Google Scholar 

  29. Curtis, F.E., Wächter, A., Zavala, V.M.: A sequential algorithm for solving nonlinear optimization problems with chance constraints. SIAM J. Optim. 28(1), 930–958 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  30. Ermoliev, Y.: Methods of Stochastic Programming. Monographs in Optimization and OR, Nauka, Moscow (1976)

  31. Fiacco, A.V., McCormick, G.P.: The sequential maximization technique \(({{\rm SUMT}})\) without parameters. Operations Res. 15, 820–827 (1967). https://doi.org/10.1287/opre.15.5.820

    Article  MathSciNet  MATH  Google Scholar 

  32. Fiacco, A.V., McCormick, G.P.: Nonlinear programming: Sequential unconstrained minimization techniques. John Wiley and Sons Inc, New York-London-Sydney (1968)

    MATH  Google Scholar 

  33. Ghadimi, S., Lan, G.: Accelerated gradient methods for nonconvex nonlinear and stochastic programming. Math. Program. 156(1–2), 59–99 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  34. Ghadimi, S., Ruszczynski, A., Wang, M.: A single timescale stochastic approximation method for nested stochastic optimization. SIAM J. Optim. 30(1), 960–979 (2020)

    Article  MathSciNet  MATH  Google Scholar 

  35. Gicquel, C., Cheng, J.: A joint chance-constrained programming approach for the single-item capacitated lot-sizing problem with stochastic demand. Ann. Oper. Res. 264(1–2), 123–155 (2018). https://doi.org/10.1007/s10479-017-2662-5

    Article  MathSciNet  MATH  Google Scholar 

  36. Göttlich, S., Kolb, O., Lux, K.: Chance-constrained optimal inflow control in hyperbolic supply systems with uncertain demand. Optimal Control Appl. Methods 42(2), 566–589 (2021). https://doi.org/10.1002/oca.2689

    Article  MathSciNet  MATH  Google Scholar 

  37. Guo, G., Zephyr, L., Morillo, J., Wang, Z., Anderson, C.L.: Chance constrained unit commitment approximation under stochastic wind energy. Comput. Oper. Res. 134, Paper No. 105398, 13 (2021). https://doi.org/10.1016/j.cor.2021.105398

  38. Guo, S., Xu, H., Zhang, L.: Convergence analysis for mathematical programs with distributionally robust chance constraint. SIAM J. Optim. 27(2), 784–816 (2017). https://doi.org/10.1137/15M1036592

    Article  MathSciNet  MATH  Google Scholar 

  39. Gurobi Optimization, LLC.: Gurobi Optimizer Reference Manual (2022). https://www.gurobi.com

  40. Henrion, R.: Optimierungsprobleme mit wahrscheinlichkeitsrestriktionen: Modelle, struktur, numerik. Lecture notes p. 43 (2010)

  41. Hong, L.J., Yang, Y., Zhang, L.: Sequential convex approximations to joint chance constrained programs: A monte carlo approach. Oper. Res. 59(3), 617–630 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  42. Jalilzadeh, A., Shanbhag, U.V., Blanchet, J.H., Glynn, P.W.: Optimal smoothed variable sample-size accelerated proximal methods for structured nonsmooth stochastic convex programs. arXiv preprint arXiv:1803.00718 (2018)

  43. Lagoa, C.M., Li, X., Sznaier, M.: Probabilistically constrained linear programs and risk-adjusted controller design. SIAM J. Optim. 15(3), 938–951 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  44. Lasserre, J.B.: Level sets and nongaussian integrals of positively homogeneous functions. IGTR 17(1), 1540001 (2015)

    MathSciNet  MATH  Google Scholar 

  45. Lei, J., Shanbhag, U.V.: Asynchronous variance-reduced block schemes for composite non-convex stochastic optimization: block-specific steplengths and adapted batch-sizes. Optimization Methods and Software 0(0), 1–31 (2020)

    MATH  Google Scholar 

  46. Lian, X., Wang, M., Liu, J.: Finite-sum composition optimization via variance reduced gradient descent. In: Artificial Intelligence and Statistics, pp. 1159–1167. PMLR (2017)

  47. Lieb, E., Loss, M.: Analysis. Crm Proceedings & Lecture Notes. American Mathematical Society (2001). https://books.google.com/books?id=Eb_7oRorXJgC

  48. Luedtke, J., Ahmed, S.: A sample approximation approach for optimization with probabilistic constraints. SIAM J. Optim. 19(2), 674–699 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  49. Markowitz, H.: Portfolio selection. J. Financ. 7(1), 77–91 (1952)

    Google Scholar 

  50. Miller, B.L., Wagner, H.M.: Chance constrained programming with joint constraints. Oper. Res. 13(6), 930–945 (1965)

    Article  MATH  Google Scholar 

  51. Morozov, A., Shakirov, S.: Introduction to integral discriminants. J. High Energy Phys. 2009(12), 002 (2009)

    Article  MathSciNet  Google Scholar 

  52. Nemirovski, A., Juditsky, A., Lan, G., Shapiro, A.: Robust stochastic approximation approach to stochastic programming. SIAM J. Optim. 19(4), 1574–1609 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  53. Nemirovski, A., Shapiro, A.: Convex approximations of chance constrained programs. SIAM J. Optim. 17(4), 969–996 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  54. Norkin, V.I.: The analysis and optimization of probability functions (1993)

  55. Pagnoncelli, B.K., Ahmed, S., Shapiro, A.: Sample average approximation method for chance constrained programming: theory and applications. J. Optim. Theory Appl. 142(2), 399–416 (2009). https://doi.org/10.1007/s10957-009-9523-6

    Article  MathSciNet  MATH  Google Scholar 

  56. Pagnoncelli, B.K., Ahmed, S., Shapiro, A.: Sample average approximation method for chance constrained programming: theory and applications. J. Optim. Theory Appl. 142(2), 399–416 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  57. Peña-Ordieres, A., Luedtke, J.R., Wächter, A.: Solving chance-constrained problems via a smooth sample-based nonlinear approximation. arXiv:1905.07377 (2019)

  58. Pflug, G.C., Weisshaupt, H.: Probability gradient estimation by set-valued calculus and applications in network design. SIAM J. Optim. 15(3), 898–914 (2005). https://doi.org/10.1137/S1052623403431639

    Article  MathSciNet  MATH  Google Scholar 

  59. Polyak, B.T.: New stochastic approximation type procedures. Automat. i Telemekh 7(98–107), 2 (1990)

    Google Scholar 

  60. Polyak, B.T., Juditsky, A.B.: Acceleration of stochastic approximation by averaging. SIAM J. Control. Optim. 30(4), 838–855 (1992)

    Article  MathSciNet  MATH  Google Scholar 

  61. Prékopa, A.: A class of stochastic programming decision problems. Math. Operationsforsch. Statist. 3(5), 349–354 (1972). https://doi.org/10.1080/02331937208842107

    Article  MathSciNet  MATH  Google Scholar 

  62. Prékopa, A.: On logarithmic concave measures and functions. Acta Scientiarum Mathematicarum 34, 335–343 (1973)

    MathSciNet  MATH  Google Scholar 

  63. Prékopa, A.: Probabilistic programming. In: Stochastic programming, Handbooks Oper. Res. Management Sci., vol. 10, pp. 267–351. Elsevier Sci. B. V., Amsterdam, Netherlands (2003). https://doi.org/10.1016/S0927-0507(03)10005-9

  64. Prékopa, A.: Stochastic programming, vol. 324. Springer Science & Business Media (2013)

  65. Prékopa, A., Szántai, T.: Flood control reservoir system design using stochastic programming. In: Mathematical programming in use, pp. 138–151. Springer (1978)

  66. Robbins, H., Monro, S.: A stochastic approximation method. The annals of mathematical statistics pp. 400–407 (1951)

  67. Royset, J.O., Polak, E.: Extensions of stochastic optimization results to problems with system failure probability functions. J. Optim. Theory Appl. 133(1), 1–18 (2007). https://doi.org/10.1007/s10957-007-9178-0

    Article  MathSciNet  MATH  Google Scholar 

  68. Scholtes, S.: Introduction to piecewise differentiable equations. Springer Science & Business Media, New York (2012)

    Book  MATH  Google Scholar 

  69. Shanbhag, U.V., Blanchet, J.H.: Budget-constrained stochastic approximation. In: Proceedings of the 2015 Winter Simulation Conference, Huntington Beach, CA, USA, December 6-9, 2015, pp. 368–379 (2015)

  70. Shapiro, A., Dentcheva, D., Ruszczyński, A.: Lectures on stochastic programming: modeling and theory. SIAM (2009)

  71. Sun, Y., Aw, G., Loxton, R., Teo, K.L.: Chance-constrained optimization for pension fund portfolios in the presence of default risk. European J. Oper. Res. 256(1), 205–214 (2017). https://doi.org/10.1016/j.ejor.2016.06.019

    Article  MathSciNet  MATH  Google Scholar 

  72. Uryasev, S.: Derivatives of probability functions and integrals over sets given by inequalities. pp. 197–223 (1994). https://doi.org/10.1016/0377-0427(94)90388-3. Stochastic programming: stability, numerical methods and applications (Gosen, 1992)

  73. Uryasev, S.: Derivatives of probability functions and some applications. pp. 287–311 (1995). https://doi.org/10.1007/BF02031712. Stochastic programming (Udine, 1992)

  74. Wang, M., Fang, E.X., Liu, H.: Stochastic compositional gradient descent: Algorithms for minimizing compositions of expected-value functions. Math. Program. 161(1–2), 419–449 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  75. Wang, M., Liu, J., Fang, E.X.: Accelerating stochastic composition optimization. The Journal of Machine Learning Research 18(1), 3721–3743 (2017)

    MathSciNet  MATH  Google Scholar 

  76. Xie, Y., Shanbhag, U.V.: SI-ADMM: A stochastic inexact ADMM framework for stochastic convex programs. IEEE Trans. Autom. Control 65(6), 2355–2370 (2020)

    Article  MathSciNet  MATH  Google Scholar 

  77. Yadollahi, E., Aghezzaf, E.H., Raa, B.: Managing inventory and service levels in a safety stock-based inventory routing system with stochastic retailer demands. Appl. Stoch. Models Bus. Ind. 33(4), 369–381 (2017). https://doi.org/10.1002/asmb.2241

    Article  MathSciNet  Google Scholar 

  78. Yang, S., Wang, M., Fang, E.X.: Multilevel stochastic gradient methods for nested composition optimization. SIAM J. Optim. 29(1), 616–659 (2019)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

The authors would like to acknowledge support from NSF CMMI-1538605, EPCN-1808266, DOE ARPA-E award DE-AR0001076, NIH R01-HL142732, and the Gary and Sheila Bello chair funds. Preliminary efforts at studying Setting A were carried out in [14]

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to U. V. Shanbhag.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

Proof of Theorem 2:

  1. (a)

    When considering uniform distributions over a compact and convex set \(\mathcal {K}\), the density is constant in this set and zero outside the set. It can then be concluded that \(\zeta \) has a log-concave density. Furthermore, \(\zeta \) has a symmetric density about the origin since \(\mathcal {K}\) is a symmetric set about the origin. Hence by Lemma 6.2 in [16], h is convex where \(h({\mathbf {x}})\triangleq 1/f({\mathbf {x}})\). \(\square \)

  2. (b)

    Since (11) is a convex program, any solution \({\mathbf {x}}^*\) satisfies \({ h({\mathbf {x}}^*) \le h({\mathbf {x}}), \ \forall \mathbf {x} \in \mathcal {X}.}\) From the positivity of f over \({\mathcal {X}}\), \( \frac{1}{f({\mathbf {x}}^*)} \le \frac{1}{f({\mathbf {x}})}\) for every \({\mathbf {x}}\in \mathcal {X}\) implying that \(f({\mathbf {x}}^*) \ge f({\mathbf {x}})\) for every \({\mathbf {x}}\in \mathcal {X}.\) Consequently, \({\mathbf {x}}^*\) is a global maximizer of (11). \(\square \)

Proof of Lemma 3:

We prove this result by showing the unimodality of f on \({\mathbb {R}}_+\) where \(f(u) = u^c e^{-u}\), implying that \(f'(u) = cu^{c-1}e^{-u} - u^ce^{-u} = 0\) if \(u = c.\) Furthermore, \(f'(u) > 0\) when \(u < c\) and \(f'(u) < 0\) when \(u > c\). Finally, \(f(0) = 0\). It follows that \(u^* = c\) is a maximizer of \(u^ce^{-u}\) on \([0,\infty )\) where \(f(c) = \frac{c^c}{e^{c}}\). \(\square \)

Proof of Proposition 2:

Recall the definition of \(F({\mathbf {x}},\xi )\) from the statement of Lemma 4. We prove (a) by considering two cases. Case (i): \(\xi \in \varXi _1({\mathbf {x}}) \cup \varXi _0({\mathbf {x}}).\) It follows that

$$\begin{aligned} \left| F({\mathbf {x}},\xi )\right| ^2 = \mathcal {C}^2_{{\mathcal {K}}} \left( (2\pi )^{{n}} e^{-2\mid \xi ^T{\mathbf {x}}\mid ^2+\Vert \xi \Vert _{{\mathcal {K}}}^2} \right)&\le \mathcal {C}^2_{{\mathcal {K}}} \left( (2\pi )^{{n}} e^{-2\mid \xi ^T{\mathbf {x}}\mid ^2+\mid \xi ^T{\mathbf {x}}\mid ^2} \right) \le \mathcal {C}^2_{{\mathcal {K}}} (2\pi )^n. \end{aligned}$$

Case (ii): \(\xi \in \varXi _2({\mathbf {x}}).\) Proceeding similarly, we obtain that

$$\begin{aligned} \left| F({\mathbf {x}},\xi )\right| ^2&\le \mathcal {C}^2_{{\mathcal {K}}} \left( (2 \pi )^{{n}} e^{{-\Vert \xi \Vert _{{\mathcal {K}}}^2}} \right) \le \mathcal {C}^2_{{\mathcal {K}}} (2\pi )^n. \end{aligned}$$

Consequently, \(\left| F({\mathbf {x}},\xi )\right| ^2 \le \mathcal {C}^2_{{\mathcal {K}}} (2\pi )^n\) for every \(\xi \in {\mathbb {R}}^n\).

(b) We observe that \(\partial F({\mathbf {x}},\xi )\) is defined as follows.

$$\begin{aligned} \partial F({\mathbf {x}},\xi ) = {\left\{ \begin{array}{ll} \left( \mathcal {C}_{{\mathcal {K}}}(2\pi )^{n/2} (-2\xi \xi ^T{\mathbf {x}}) e^{-|\xi ^{T} {\mathbf {x}}|^{2}+\frac{\Vert \xi \Vert _{{\mathcal {K}}}^2}{2}}\right) , &{} \xi \in \varXi _1({\mathbf {x}}) \triangleq \left\{ \xi \mid |\xi ^{T} {\mathbf {x}}|^{{2}} > \Vert \xi \Vert ^{2}_{\mathcal {K}}\right\} \\ \left( -\mathcal {C}_{{\mathcal {K}}} (2\pi )^{n/2} e^{-\max \{|\xi ^{T} {\mathbf {x}}|^{2}, {\Vert \xi \Vert ^{2}_{\mathcal {K}}}\}+\frac{\Vert \xi \Vert _{{\mathcal {K}}}^2}{2}}\right) \left[ 0,2\xi (\xi ^T{\mathbf {x}})\right] , &{} \xi \in \varXi _0({\mathbf {x}}) \triangleq \left\{ \xi \mid |\xi ^{T} {\mathbf {x}}|^{2} = \Vert \xi \Vert ^{2}_{\mathcal {K}}\right\} \\ \mathbf{0}. &{} \xi \in \varXi _2({\mathbf {x}}) \triangleq \left\{ \xi \mid |\xi ^{T} {\mathbf {x}}|^{2} < \Vert \xi \Vert ^{2}_{\mathcal {K}}\right\} \end{array}\right. } \end{aligned}$$

Consequently, it follows that \({\mathbb {E}}_{{\tilde{p}}} \left[ \, \Vert G({\mathbf {x}},\xi )\Vert ^2 \, \right] \) is bounded as follows.

$$\begin{aligned}&{\mathbb {E}}\left[ \, \Vert G({\mathbf {x}},\xi )\Vert ^2\, \right] = \int _{\varXi } \Vert G({\mathbf {x}},\xi )\Vert ^2 {\tilde{p}}(\xi ) d\xi \nonumber \\&= \int _{\varXi _1({\mathbf {x}})} \Vert G({\mathbf {x}},\xi )\Vert ^2 {\tilde{p}}(\xi ) d\xi + \int _{\varXi _2({\mathbf {x}})} \Vert \underbrace{G({\mathbf {x}},\xi )}_{ \ = \ \mathbf{0}}\Vert ^2 {\tilde{p}}(\xi ) d\xi \nonumber \\&\quad + \int _{\varXi _0({\mathbf {x}})} \Vert G({\mathbf {x}},\xi )\Vert ^2 {\tilde{p}}(\xi ) d\xi \nonumber \\&= \int _{\varXi _1({\mathbf {x}})} \Vert G({\mathbf {x}},\xi )\Vert ^2 {\tilde{p}}(\xi ) d\xi , \end{aligned}$$
(33)

where the last equality follows from observing that \(G({\mathbf {x}},\xi ) = 0\) for \(\xi \in \varXi _2({\mathbf {x}})\) and the integral in (33) is zero because \(\varXi _0({\mathbf {x}})\) is a measure zero set. It follows that \({\mathbb {E}}\left[ \, \Vert G({\mathbf {x}},\xi )\Vert ^2\, \right] \) can be bounded as follows:

$$\begin{aligned}&{\mathbb {E}}\left[ \, \Vert G({\mathbf {x}},\xi )\Vert ^2\right] \nonumber \\&\quad = \int _{\varXi _1({\mathbf {x}})}\left( \mathcal {C}^2_{{\mathcal {K}}}(2\pi )^{n} (4\Vert \xi \Vert _2^2 (\xi ^T{\mathbf {x}})^2) e^{-2(\xi ^T{\mathbf {x}})^2 +{\Vert \xi \Vert ^2_{{\mathcal {K}}}}} \right) \frac{1}{D_{{\mathcal {K}}}}(2\pi )^{-\frac{n}{2}} e^{\frac{{\Vert \xi \Vert ^2_{{\mathcal {K}}}}}{2}} d \xi \end{aligned}$$
(34)
$$\begin{aligned}&\quad \le \int _{\varXi _1({\mathbf {x}})}\left( \mathcal {C}^2_{{\mathcal {K}}}(2\pi )^{n} (4\Vert \xi \Vert _2^2 (\xi ^T{\mathbf {x}})^2) e^{-(\xi ^T{\mathbf {x}})^2} \right) \frac{1}{D_{{\mathcal {K}}}}(2\pi )^{-\frac{n}{2}} e^{\frac{{\Vert \xi \Vert ^2_{{\mathcal {K}}}}}{2}} d\xi , \end{aligned}$$
(35)

where the inequality follows from \(\xi \in \varXi _1({\mathbf {x}},u)\). Next, we consider the expression \((\xi ^T{\mathbf {x}})^2 e^{-(\xi ^T{\mathbf {x}})^2}\) or \(ue^{-u}\). We note that by Lemma 3, \(ue^{-u}\) is a unimodal function and \(u^* = 1\) is a maximizer with value \(e^{-1}\). Consequently, we have that

$$\begin{aligned} \max _{\{(\xi ^T{\mathbf {x}}) \mid \xi \in \varXi ({\mathbf {x}})\}} (\xi ^T{\mathbf {x}})^2 e^{-(\xi ^T{\mathbf {x}})^2} \le \max _{u \in {\mathbb {R}}_+} u e^{-u}\overset{\tiny \mathrm {Lemma}~3}{\le } \tfrac{1}{e}, \end{aligned}$$

implying that

$$\begin{aligned}&{\mathbb {E}}[\Vert G({\mathbf {x}},\xi )\Vert ^2] \le \int _{\varXi _1({\mathbf {x}})} \left( \mathcal {C}_{{\mathcal {K}}}^2(2\pi )^{n} (\Vert \xi \Vert _2^2 (\xi ^T{\mathbf {x}})^2) e^{-(\xi ^T{\mathbf {x}})^2} \right) \tfrac{1}{D_{{\mathcal {K}}}}(2\pi )^{-\tfrac{n}{2}} e^{-\tfrac{{\Vert \xi \Vert ^2_{{\mathcal {K}}}}}{2}} d\xi \\&\quad \le e^{-1} \mathcal {C}^2_{{\mathcal {K}}}(2\pi )^{n}\int _{\varXi _1({\mathbf {x}})} \Vert \xi \Vert _2^2 \tfrac{1}{D_{{\mathcal {K}}}}(2\pi )^{-\tfrac{n}{2}} e^{-\tfrac{{\Vert \xi \Vert ^2_{{\mathcal {K}}}}}{2}} d\xi \\&\quad \le e^{-1} \mathcal {C}^2_{{\mathcal {K}}}(2\pi )^{n}\int _{{\mathbb {R}}^n} \Vert \xi \Vert _2^2 \tfrac{1}{D_{{\mathcal {K}}}}(2\pi )^{-\tfrac{n}{2}} e^{-\tfrac{{\Vert \xi \Vert ^2_{{\mathcal {K}}}}}{2}} d\xi = e^{-1} \mathcal {C}^2_{{\mathcal {K}}}(2\pi )^{n}{\mathbb {E}}_{{{\tilde{p}}}} \left[ \Vert \xi \Vert _2^2\right] . \end{aligned}$$

\(\square \)

Proof of Proposition 3:

(a) Since \(\Vert \xi \Vert _{{\mathcal {K}}}^2 = \Vert \xi \Vert _p^2\), it follows from Theorem 1 that

$$\begin{aligned} f({\mathbf {x}})&= \int _{{\mathbb {R}}^n} \left( \mathcal {C}(2\pi \sigma ^2)^{\tfrac{n}{2}} e^{-\max \{ \mid \xi ^T {\mathbf {x}}\mid ^2, {\Vert \xi \Vert _{p}^2}\} } \right) d\xi \\&= \int _{{\mathbb {R}}^n} \underbrace{\left( \mathcal {C}(2\pi \sigma ^2)^{\tfrac{n}{2}} e^{-\max \{ \mid \xi ^T {\mathbf {x}}\mid ^2, {\Vert \xi \Vert _{p}^2} \}+\tfrac{{\Vert \xi \Vert _{2}^2}}{2\sigma ^2} } \right) }_{\triangleq F({\mathbf {x}},\xi )} \underbrace{(2\pi \sigma ^2)^{-\tfrac{n}{2}} e^{-\tfrac{{\Vert \xi \Vert _{2}^2}}{2\sigma ^2}}}_{\triangleq {\tilde{p}}(\xi )} d\xi . \end{aligned}$$

(b) Omitted (similar to proof of Proposition 2(a).

(c) Next, we derive a bound on the second moment of \(\Vert G({\mathbf {x}},\xi )\Vert \) akin to Prop. 2(b). We observe that \( \partial F({\mathbf {x}},\xi ) \) is defined as

$$\begin{aligned} \partial F({\mathbf {x}},\xi ) = {\left\{ \begin{array}{ll} \left( \mathcal {C}(2\pi \sigma ^2)^{n/2} (-2\xi \xi ^T{\mathbf {x}}) e^{-|\xi ^{T} {\mathbf {x}}|^{{2}}+\frac{{\Vert \xi \Vert _{2}^2}}{2\sigma ^2}}\right) , &{} \xi \in \varXi _1({\mathbf {x}}) \triangleq \left\{ \xi \mid |\xi ^{T} {\mathbf {x}}|^{{2}} > \Vert \xi \Vert ^{2}_{p} \right\} \\ \left( -\mathcal {C} (2\pi \sigma ^2)^{n/2} e^{-\max \{|\xi ^{T} {\mathbf {x}}|^{{2}}, {\Vert \xi \Vert ^{{2}}_{p}}\}+\frac{{\Vert \xi \Vert _{2}^2}}{2\sigma ^2}}\right) \left[ 0,2\xi (\xi ^T{\mathbf {x}})\right] , &{} \xi \in \varXi _0({\mathbf {x}}) \triangleq \left\{ \xi \mid |\xi ^{T} {\mathbf {x}}|^{{2}} = \Vert \xi \Vert ^{2}_{p}\right\} \\ \mathbf{0}. &{} \xi \in \varXi _2({\mathbf {x}}) \triangleq \left\{ \xi \mid |\xi ^{T} {\mathbf {x}}|^{{2}} < \Vert \xi \Vert ^{2}_{p}\right\} \end{array}\right. } \end{aligned}$$

Consequently, \({\mathbb {E}}\left[ \Vert G({\mathbf {x}},\xi )\Vert ^2\right] \) can be bounded as follows.

$$\begin{aligned}&{\mathbb {E}}\left[ \Vert G({\mathbf {x}},\xi )\Vert ^2\right] = \int _{\varXi } \Vert G({\mathbf {x}},\xi )\Vert ^2 {\tilde{p}}(\xi ) d\xi \nonumber \\&= \int _{\varXi _1({\mathbf {x}})} \Vert G({\mathbf {x}},\xi )\Vert ^2 {\tilde{p}}(\xi ) d\xi + \int _{\varXi _2({\mathbf {x}})} \Vert \underbrace{G({\mathbf {x}},\xi )}_{ \ = \ \mathbf{0}}\Vert ^2 {\tilde{p}}(\xi ) d\xi \nonumber \\&\quad + \int _{\varXi _0({\mathbf {x}})} \Vert G({\mathbf {x}},\xi )\Vert ^2 {\tilde{p}}(\xi ) d\xi \nonumber \\&= \int _{\varXi _1({\mathbf {x}})} \Vert G({\mathbf {x}},\xi )\Vert ^2 {\tilde{p}}(\xi ) d\xi , \end{aligned}$$
(36)

where the last equality follows from observing that \(G({\mathbf {x}},\xi ) = 0\) for \(\xi \in \varXi _2({\mathbf {x}})\) and the integral in (36) is zero because \(\varXi _0({\mathbf {x}})\) is a measure zero set. It follows that

$$\begin{aligned}&\ {\mathbb {E}}[\Vert G({\mathbf {x}},\xi )\Vert ^2] = \int _{\varXi _1({\mathbf {x}})}\left( \mathcal {C}^2(2\pi \sigma ^2)^{n} (4\Vert \xi \Vert _2^2 (\xi ^T{\mathbf {x}})^2) e^{-2(\xi ^T{\mathbf {x}})^2 +{\tfrac{\Vert \xi \Vert ^2_{2}}{\sigma ^2}}} \right) (2\pi \sigma ^2)^{-\tfrac{n}{2}} e^{\tfrac{{-\Vert \xi \Vert ^2_{2}}}{2\sigma ^2}} d\xi \nonumber \\&\le \int _{\varXi _1({\mathbf {x}})}\left( \mathcal {C}^2(2\pi \sigma ^2)^{n} (4\Vert \xi \Vert _2^2 (\xi ^T{\mathbf {x}})^2) e^{-2(\xi ^T{\mathbf {x}})^2 +{\Vert \xi \Vert ^2_{p}}} \right) (2\pi \sigma ^2)^{-\tfrac{n}{2}} e^{\tfrac{{-\Vert \xi \Vert ^2_{2}}}{2\sigma ^2}} d\xi \end{aligned}$$
(37)
$$\begin{aligned}&\le \int _{\varXi _1({\mathbf {x}})}\left( \mathcal {C}^2(2\pi \sigma ^2)^{n} (4\Vert \xi \Vert _2^2 (\xi ^T{\mathbf {x}})^2) e^{-(\xi ^T{\mathbf {x}})^2} \right) (2\pi \sigma ^2)^{-\tfrac{n}{2}} e^{\tfrac{{-\Vert \xi \Vert ^2_{2}}}{2\sigma ^2}} d\xi , \end{aligned}$$
(38)

where (38) follows from \(\xi \in \varXi _1({\mathbf {x}})\) and (37) follows from

$$\begin{aligned} \frac{\Vert \xi \Vert _2^2}{\sigma ^2} \le \Vert \xi \Vert _p^2, \text{ where } \sigma ^2 = {\left\{ \begin{array}{ll} n^{1/2-1/p}, &{} p \ge 2 \\ 1. &{} 1 \le p < 2 \end{array}\right. } \end{aligned}$$

We may then conclude that

$$\begin{aligned}&\ {\mathbb {E}}[\Vert G({\mathbf {x}},\xi )\Vert ^2] \le \int _{\varXi _1({\mathbf {x}})} \left( \mathcal {C}^2(2\pi \sigma ^2)^{n} (\Vert \xi \Vert _2^2 (\xi ^T{\mathbf {x}})^2) e^{-(\xi ^T{\mathbf {x}})^2} \right) (2\pi \sigma ^2)^{-\tfrac{n}{2}} e^{-\frac{{\Vert \xi \Vert ^2_{2}}}{2\sigma ^2}} d\xi \nonumber \\&\le e^{-1} \mathcal {C}^2(2\pi \sigma ^2)^{n}\int _{\varXi _1({\mathbf {x}})} \left( \Vert \xi \Vert ^2 \right) (2\pi \sigma ^2)^{-\frac{n}{2}} e^{-\tfrac{{\Vert \xi \Vert ^2_{2}}}{2\sigma ^2}} d\xi \nonumber \\&\le e^{-1} \mathcal {C}^2(2\pi \sigma ^2)^{n}\int _{{\mathbb {R}}^n} \left( \Vert \xi \Vert _2^2\right) (2\pi \sigma ^2)^{-\frac{n}{2}} e^{-\tfrac{{\Vert \xi \Vert ^2_{2}}}{2\sigma ^2}} d\xi = e^{-1} \mathcal {C}^2(2\pi \sigma ^2)^{n}{\mathbb {E}}\left[ \Vert \xi \Vert ^2\right] . \end{aligned}$$
(39)

where (39) follows from Lemma 3. \(\square \)

Proof of Lemma 6:

Suppose \(({\mathbf {x}},{\mathbf {y}})\) is feasible with respect to (PM\(_{A,\mathrm{ext}}^{2}\)). Then \({\mathbf {x}}\in \mathcal {X}\) and is therefore feasible with (PM\(_{A}^{{\mathcal {E}}}\)). In addition,

$$\begin{aligned} f({\mathbf {x}})&\triangleq {\mathbb {P}}\left\{ \zeta \in {\mathbb {R}}^n \, \left| \, \zeta \in {\mathcal {K}}_{{\mathcal {E}}}, \left| \zeta ^{T} {\mathbf {x}}\right| \ \le 1 \right. \, \right\} ={\mathbb {P}} \left\{ \zeta \, \left| \, \zeta ^{T} U^{T} {\varSigma ^{-1}} U \zeta \le 1, \left| \zeta ^{T} {\mathbf {x}}\right| \le 1 \right. \right\} \\&= {\mathbb {P}}\left\{ \zeta \in {\mathbb {R}}^n \, \left| \, \Vert {\varSigma ^{-1/2}} U\zeta \Vert ^2_2 \le 1, \left| \zeta ^{T} {\mathbf {x}}\right| \ \le 1 \right. \right\} \\&= {\mathbb {P}} \left\{ U^{T} {\varSigma ^{1/2}} {\eta } \in {\mathbb {R}}^n\, \left| \, \Vert {\eta }\Vert ^2_2 \le 1, \left| (U^{T} {\varSigma ^{1/2} \eta })^{T} {\mathbf {x}}\right| \ \le 1 \right. \right\} \\&= {\mathbb {P}} \left\{ U^{T} {\varSigma ^{1/2} \eta } \in {\mathbb {R}}^n\, \left| \, \Vert {\eta }\Vert ^2_2 \le 1, \left| {\eta ^{T} \varSigma ^{1/2}} U {\mathbf {x}}\right| \ \le 1 \right. \right\} \\&= {\mathbb {P}} \left\{ {\eta } \in {\mathbb {R}}^n \, \left| \, {\eta } \in {\mathcal {K}}_{2} \, \left| \, {\eta }^{T} \varSigma ^{1/2} U {\mathbf {x}}\right| \ \le 1 \right. \right\} \triangleq g({\mathbf {x}}). \end{aligned}$$

\(\square \)

Proof of Proposition 6:

(a) The result follows by a transformation argument. We define a new variable \({ {\tilde{\zeta }} \in {\tilde{{\mathcal {K}}}}}\) such that \({ {\tilde{\zeta }}\triangleq \zeta -\mu }\) where \({ {\tilde{{\mathcal {K}}}} \triangleq \{ {\tilde{\zeta }}: \Vert {\tilde{\zeta }} \Vert _p \le \alpha \}}\). The set \({\tilde{{\mathbf {K}}}}({\mathbf {x}})\) can be defined as the following

$$\begin{aligned} {\tilde{{\mathbf {K}}}}({\mathbf {x}}) =\left\{ {\tilde{\zeta }} \, \left| \, {\tilde{\zeta }} \in {\tilde{{\mathcal {K}}}} \right. \right\} \, \bigcap \, \left\{ {\tilde{\zeta }} \, \left| \, {\tilde{\zeta }} \le T{\mathbf {x}}-\mu \right. \right\} . \end{aligned}$$

We first show that \(\zeta \in {\mathbf {K}}({\mathbf {x}})\) if and only if \({\tilde{\zeta }} \in {\tilde{{\mathbf {K}}}}({\mathbf {x}})\). Suppose \(\zeta \in {\mathbf {K}}({\mathbf {x}})\). Then \(\zeta \in \mathcal {K}\) and \(c({\mathbf {x}},\zeta ) = T{\mathbf {x}}-\zeta \ge 0\). If \(\zeta \in \mathcal {K}\), then \(\Vert \zeta -\mu \Vert _p \le \alpha \) or \(\Vert {\tilde{\zeta }}\Vert _p \le \alpha \) where \({\tilde{\zeta }} = \zeta - \mu \). Furthermore, \(T{\mathbf {x}}\ge \zeta \) can be rewritten as \(T{\mathbf {x}}- \mu \ge \zeta - \mu \) or \(T{\mathbf {x}}- \mu \ge {\tilde{\zeta }}\). It follows that

$$\begin{aligned} {\tilde{\zeta }} \in {\tilde{{\mathbf {K}}}}({\mathbf {x}}) \, = \, \left\{ {\tilde{\zeta }} \, \left| \, {\tilde{\zeta }} \in {\tilde{{\mathcal {K}}}} \right. \right\} \, \bigcap \, \left\{ {\tilde{\zeta }} \, \left| \, T{\mathbf {x}}- \mu \ge {\tilde{\zeta }}\right. \right\} . \end{aligned}$$

The reverse direction follows similarly. Consequently, \({\mathbb {P}}\left\{ \zeta \, \left| \, \zeta \in {\mathbf {K}}({\mathbf {x}}) \right. \right\} = {\mathbb {P}}\left\{ {\tilde{\zeta }} \, \left| \, {\tilde{\zeta }} \in {\tilde{{\mathbf {K}}}}({\mathbf {x}}) \right. \right\} .\) We now analyze the latter probability. It may be observed that the Minkowski functional associated with \( {\tilde{{\mathcal {K}}}}\) is given by \( \Vert {\tilde{\zeta }}\Vert _{{\tilde{{\mathcal {K}}}}} = \tfrac{1}{\alpha }\Vert {\tilde{\zeta }}\Vert _p\). Since \( {T_{i,\bullet } {\mathbf {x}}-\mu _i \ge \delta >0 }\) for \( i=1,\ldots ,d \), it follows that

$$\begin{aligned} {\tilde{{\mathbf {K}}}}({\mathbf {x}})&= \Bigg \{{\tilde{\zeta }} \, \left| \, \tfrac{1}{\alpha } \Vert {\tilde{\zeta }}\Vert _p \le 1 \right. \Bigg \} \, \bigcap \, \left\{ {\tilde{\zeta }} \, \left| \, \bigcap _{i=1}^d \tfrac{\max \{{\tilde{\zeta }}_i,0\}}{T_{i,\bullet } {\mathbf {x}}-\mu _i}\le 1 \right. \right\} \\&= \Bigg \{{\tilde{\zeta }} \, \left| \, \tfrac{1}{\alpha ^2}\Vert {\tilde{\zeta }} \Vert ^{2}_p \le 1 \right. \Bigg \} \, \bigcap \, \left\{ {\tilde{\zeta }} \, \left| \, \bigcap _{i=1}^d \left( \tfrac{\max \{{\tilde{\zeta }}_i,0\}}{T_{i,\bullet } {\mathbf {x}}-\mu _i}\right) ^2 \le 1 \right. \right\} \\&= \left\{ {\tilde{\zeta }} \, \left| \, \max \left\{ \tfrac{1}{\alpha ^2}\Vert {\tilde{\zeta }}\Vert _{p}^2,\left( \tfrac{\max \{{\tilde{\zeta }}_1,0\}}{T_{1,\bullet }{\mathbf {x}}-\mu _1}\right) ^2,\cdots , \left( \tfrac{\max \{{\tilde{\zeta }}_d,0\}}{T_{d,\bullet }{\mathbf {x}}-\mu _d}\right) ^2 \right\} \le 1 \right. \right\} . \end{aligned}$$

Since \( g_i({\mathbf {x}},{\tilde{\zeta }}) \ \triangleq \ \left( \frac{\max \{{\tilde{\zeta }}_i,0\}}{T_{i,\bullet } {\mathbf {x}}-\mu _i}\right) ^2 \) for \( i=1,\ldots , d \) and \( g_{d+1}({\mathbf {x}}, {\tilde{\zeta }}) \triangleq \tfrac{1}{\alpha ^2}\Vert {\tilde{\zeta }}\Vert ^{2}_{p} \) are PHFs with degree 2, then \( g({{\mathbf {x}}}, {\tilde{\zeta }}) \triangleq \max \{ g_1({\mathbf {x}}, {\tilde{\zeta }}),\ldots ,g_{d+1}({\mathbf {x}},{\tilde{\zeta }}) \} \) is positively homogeneous with degree 2. By selecting \(h(\zeta ) = 1\) and \({\varLambda } = {\tilde{{\mathbf {K}}}}({\mathbf {x}})\), we may invoke Lemma 2, leading to the following equality.

$$\begin{aligned} f({\mathbf {x}}) \, = \, \int _{{\tilde{{\mathbf {K}}}}({\mathbf {x}})} 1 \ d {\tilde{\zeta }} = \frac{1}{\mathrm {Vol}({\mathcal {K}})} \frac{1}{\varGamma (1+d/2)} \int _{{\mathbb {R}}^d} e^{-g({\mathbf {x}},\xi )} \ d\xi . \end{aligned}$$
(40)

The Eq. (40) can be rewritten as

$$\begin{aligned} f({\mathbf {x}})&= \int _{{\mathbb {R}}^d} \underbrace{\left( \mathcal {C} (2\pi \sigma ^2)^{d/2} e^{-g({\mathbf {x}},\xi ) +\frac{\Vert \xi \Vert ^2_{2}}{2\sigma ^2}}\right) }_{\triangleq F({\mathbf {x}},\xi )}\underbrace{\left( \tfrac{1}{(2\pi \sigma ^2)^{d/2}} {e^{-\tfrac{\Vert \xi \Vert ^2_{2}}{2\sigma ^2}}} \right) }_{\triangleq p(\xi )} \ d\xi \\&= \int _{{\mathbb {R}}^{d}} F({\mathbf {x}},\xi ) \ {{\tilde{p}}}(\xi ) \ d\xi = \mathcal {C} \ {\mathbb {E}}_{{{\tilde{p}}}(\xi )}[F({\mathbf {x}},\xi )], \text{ where } \mathcal {C} \triangleq \tfrac{1}{\text {Vol}({\mathcal {K}})} \ {\tfrac{1}{\varGamma (1+d/2)}}, \end{aligned}$$

(b) Omitted (similar to proof of Lemma 8 (a)).

(c) When \( {\mathcal {K}}\) satisfies Assumption 2, the proof of Lemma 8(b) requires slight modification. Suppose \( F({\mathbf {x}},\xi ) \) and \( p(\xi ) \) are defined as in (a). Then we may define \(\partial F({\mathbf {x}},\xi )\) as

$$\begin{aligned} \partial F({\mathbf {x}},\xi ) = {\left\{ \begin{array}{ll} \left( {\mathcal {C}}(2\pi \sigma ^2)^{{d}/2} \frac{2(\max \{\xi _i,0\})^2 T_{i,\bullet }^T}{(T_{i,\bullet } {\mathbf {x}}-\mu _i)^3} e^{-g_i({\mathbf {x}},\xi )+\frac{{\Vert \xi \Vert _{2}^2}}{2\sigma ^2}}\right) , &{} \xi \in \varXi _i({\mathbf {x}}), i = 1, \cdots , d \\ \left( -{\mathcal {C}}(2\pi \sigma ^2)^{{d}/2} e^{-g({\mathbf {x}},\xi ) +\frac{{\Vert \xi \Vert _{2}^2}}{2\sigma ^2}}\right) H({\mathbf {x}},\xi ), &{} \xi \in \varXi _0({\mathbf {x}})\\ \mathbf{0}. &{} \xi \in \varXi _{d+1}({\mathbf {x}}), \end{array}\right. } \end{aligned}$$

where \(H({\mathbf {x}},\xi )\) denotes the Clarke generalized gradient of \(g({\mathbf {x}},\xi )\), defined as in (17). Consequently, it follows that \({\mathbb {E}} \left[ \Vert G({\mathbf {x}},\xi )\Vert ^2\right] \) is bounded as follows.

$$\begin{aligned}&\quad {\mathbb {E}}\left[ \Vert G({\mathbf {x}},\xi )\Vert ^2\right] = \int _{{\mathbb {R}}^{d}} \Vert G({\mathbf {x}},\xi )\Vert ^2 {\tilde{p}}(\xi ) d\xi \nonumber \\&= \sum _{i=1}^{d} \int _{\varXi _i({\mathbf {x}})} \Vert G({\mathbf {x}},\xi )\Vert ^2 {\tilde{p}}(\xi ) d\xi + \int _{\varXi _{d+1}({\mathbf {x}})} \Vert \underbrace{G({\mathbf {x}},\xi )}_{ \ = \ \mathbf{0}}\Vert ^2 {\tilde{p}}(\xi ) d\xi \nonumber \\&\quad + \int _{\varXi _0({\mathbf {x}})} \Vert G({\mathbf {x}},\xi )\Vert ^2 {\tilde{p}}(\xi ) d\xi \nonumber \\&= \sum _{i=1}^{d}\int _{\varXi _i({\mathbf {x}})} \Vert G({\mathbf {x}},\xi )\Vert ^2 {\tilde{p}}(\xi ) d\xi , \end{aligned}$$
(41)

where the last equality follows from observing that \(G({\mathbf {x}},\xi ) = 0\) for \(\xi \in \varXi _{d+1}({\mathbf {x}})\) and the integral in (41) is zero because \(\varXi _0({\mathbf {x}})\) is a measure zero set. It follows that

$$\begin{aligned} {\mathbb {E}}&\left[ \Vert G(x,\xi )\Vert ^2 \right] \\&=\sum _{i=1}^{d}\int _{\varXi _i(x)} 4\mathcal {C}^2(2\pi \sigma ^2)^d \frac{\Vert T_{i,\bullet }\Vert ^2}{(T_{i,\bullet } {\mathbf {x}}-\mu _i)^2} \left( \frac{\xi _k}{{T_{i,\bullet } {\mathbf {x}}-\mu _i}}\right) ^4 \\&\qquad e^{-\frac{2(\xi _i)^2}{(T_{i,\bullet } {\mathbf {x}}-\mu _i)^2}+\frac{\Vert \xi \Vert _{2}^2}{\sigma ^2}}\left( \tfrac{1}{(2\pi \sigma ^2)^{d/2}} e^{-\tfrac{\Vert \xi \Vert _{2}^2}{2\sigma ^2}}\right) d\xi ,\\&\le \sum _{i=1}^{d}\int _{\varXi _i(x)} 4\mathcal {C}^2(2\pi \sigma ^2)^d \frac{\Vert T_{i,\bullet }\Vert ^2}{\delta ^2}\left( \frac{\xi _k}{{T_{i,\bullet } {\mathbf {x}}-\mu _i}} \right) ^4 e^{-\frac{2(\xi _i)^2}{(T_{i,\bullet } {\mathbf {x}}-\mu _i)^2}+\frac{\Vert \xi \Vert _{2}^2}{\sigma ^2}}\left( \tfrac{1}{(2\pi \sigma ^2)^{d/2}} e^{-\tfrac{\Vert \xi \Vert _{2}^2}{2 \sigma ^2}}\right) d\xi \\&\le \sum _{i=1}^{d}\int _{\varXi _i(x)} 4\mathcal {C}^2(2\pi \sigma ^2)^d \frac{\Vert T_{i,\bullet }\Vert ^2}{\delta ^2}\left( \frac{\xi _i}{{T_{i,\bullet } {\mathbf {x}}-\mu _i}}\right) ^4\\&\qquad e^{-\left( 2-\tfrac{\alpha ^2}{\sigma ^2}\right) \frac{(\xi _i)^2}{(T_{i,\bullet } {\mathbf {x}}-\mu _i)^2}}\left( \tfrac{1}{(2\pi \sigma ^2)^{d/2}} ^{-\tfrac{\Vert \xi \Vert _{2}^2}{2\sigma ^2}}\right) d\xi \end{aligned}$$

where the first inequality follows from \( {T_{i,\bullet } {\mathbf {x}}-\mu _i \ge \delta >0 }\) for all i, and the second inequality follows from \( \xi \in \varXi _i({\mathbf {x}})\). It follows from Lemma 3 that given any \( \alpha \), by choosing the variance \( \sigma ^2\) of the random variable \( \xi \) such that \(\sigma ^2 = \alpha ^2 \) leads to the bound \( {{\mathbb {E}} \left[ \Vert G({\mathbf {x}},\xi )\Vert ^2 \right] \le 16\mathcal {C}^2(2\pi \sigma ^2)^d {\sum _{i=1}^d} \frac{\Vert T_{i,\bullet }\Vert ^2}{\delta ^2 e^2}}. \) \(\square \)

Proof of Lemma 10:

If \({{\tilde{G}}}({\mathbf {x}}_k,\xi ) \triangleq G({\mathbf {x}}_k,\xi ) - {\mathbb {E}}[G({\mathbf {x}}_k,\xi )]\), by the conditional independence of \({\tilde{G}}({\mathbf {x}}_k,\xi _j)\) and \({\tilde{G}}({\mathbf {x}}_k,\xi _{\ell })\) for \(j \ne \ell \), we have

$$\begin{aligned}&{\mathbb {E}}\left[ \Vert {\bar{w}}_{G,k}\Vert ^2 \mid \mathcal {F}_k \right] = \frac{1}{N^2_k}{\mathbb {E}}\left[ \left\| \sum _{j=1}^{N_k} {\tilde{G}} ({\mathbf {x}}_k,\xi _j)\right\| ^2 \Bigg | \, \mathcal {F}_k \right] \nonumber \\&\quad = \frac{1}{N_k^2} {\mathbb {E}}\left[ \left[ \sum _{j=1}^{N_k} \Vert {\tilde{G}}({\mathbf {x}}_k,\xi _j)\Vert ^2 + \sum _{\ell \ne j} 2{\tilde{G}}({\mathbf {x}}_k,\xi _{\ell })^T {\tilde{G}}({\mathbf {x}}_k,\xi _j) \right] \, \Bigg | \, \mathcal {F}_k \right] \nonumber \\&\quad = \frac{1}{N_k}\left( {\mathbb {E}}\left[ \Vert {G}({\mathbf {x}}_k,\xi )\Vert ^2\, \left| \, \mathcal {F}_k \right. \right] \right. \nonumber \\&\qquad \left. + \Vert {\mathbb {E}}\left[ G({\mathbf {x}}_k,\xi ) \, \left| \, \mathcal {F}_k \right. \right] \Vert ^2 - 2{\mathbb {E}}\left[ G({\mathbf {x}}_k,\xi )\, \left| \, \mathcal {F}_k \right. \right] ^T{\mathbb {E}}\left[ G({\mathbf {x}}_k,\xi ) \, \left| \, \mathcal {F}_k \right. \right] \right) \nonumber \\&\quad = \frac{1}{N_k}\left( {\mathbb {E}}\left[ \Vert {G}({\mathbf {x}}_k,\xi ) \Vert ^2\, \left| \, \mathcal {F}_k \right. \right] - \Vert {\mathbb {E}} \left[ G({\mathbf {x}}_k,\xi ) \, \left| \, \mathcal {F}_k \right. \right] \Vert ^2 \right) \nonumber \\&\quad \le \frac{1}{N_k} {\mathbb {E}} \left[ \Vert {G}({\mathbf {x}}_k,\xi )\Vert ^2 \mid \mathcal {F}_k\right] . \end{aligned}$$
(42)

By (42) and Prop. 2, \({\mathbb {E}}\left[ \Vert {\bar{w}}_{G,k}\Vert ^2 \, \left| \, \mathcal {F}_k \right. \right] \ \le \ \frac{\mathcal {C}^2_{{\mathcal {K}}}(2\pi )^n}{eN_k} {\mathbb {E}}_{{{\tilde{p}}}}\left[ \Vert \xi \Vert ^2\right] \) for Setting A. Similarly, for Setting B, by Lemma. 8,

$$\begin{aligned} {{\mathbb {E}}[\Vert {\bar{w}}_{G,k}\Vert ^2 \mid \mathcal {F}_k ] \le 16\mathcal {C}^2(2\pi \sigma ^2)^d {\sum _{i=1}^d} \frac{\Vert T_{i,\bullet }\Vert ^2}{\delta ^2 e^2 N_k} }. \end{aligned}$$

In addition, for Setting A, \( {\mathbb {E}}\left[ \Vert {\bar{w}}_{f,k}\Vert ^2 \, \left| \, \mathcal {F}_k \right. \right] \le \frac{2(\mathcal {C}_{{\mathcal {K}}}^2(2\pi )^n+1)}{N_k}\) while for Setting B, we obtain that \({\mathbb {E}}\left[ \, \Vert {\bar{w}}_{f,k}\Vert ^2 \, \left| \, \mathcal {F}_k \right. \right] \le \frac{\mathcal {C}^2(2\pi \sigma ^2)^d}{N_k}.\) \(\square \)

Proof of Lemma 11:

(Setting A) Consider \({\bar{w}}_k\), where \({\bar{w}}_k\) is defined as \( {\bar{w}}_k \triangleq \frac{-(G_k+{\bar{w}}_{G,k})}{(f({\mathbf {x}}_k) + {\bar{w}}_{f,k})^{2}+\epsilon _k}- \frac{-G_k}{(f({\mathbf {x}}_k))^{2}}.\) We have that

$$\begin{aligned} \Vert {\bar{w}}_k\Vert ^2&= \left\| \frac{-(G_k+{\bar{w}}_{G,k})}{(f({\mathbf {x}}_k) + {\bar{w}}_{f,k})^{2}+\epsilon _k}- \frac{-G_k}{(f({\mathbf {x}}_k))^{2}} \right\| ^2 \\&= \left\| \frac{-(G_k+{\bar{w}}_{G,k})}{(f({\mathbf {x}}_k) + {\bar{w}}_{f,k})^{2} +\epsilon _k}-\frac{{-}(G_k+{\bar{w}}_{G,k})}{(f({\mathbf {x}}_k))^{2}+\epsilon _k} +\frac{{-}(G_k+{\bar{w}}_{G,k})}{(f({\mathbf {x}}_k))^{2}+\epsilon _k}\right. \\&\quad \left. - \frac{{-}G_k}{(f({\mathbf {x}}_k))^{2}+\epsilon _k} + \frac{{-}G_k}{(f({\mathbf {x}}_k))^{2}+\epsilon _k}- \frac{{-}G_k}{(f({\mathbf {x}}_k))^{2}} \right\| ^2 \\&\le 3\left\| G_k - G_k+{\bar{w}}_{G,k}\right\| ^2 \frac{1}{((f({\mathbf {x}}_k))^{2} +\quad \epsilon _k)^2}\\&\quad +3\left\| G_k+{\bar{w}}_{G,k}\right\| ^2 \left\| \frac{1}{(f({\mathbf {x}}_k))^{2}+\epsilon _k} -\frac{1}{(f({\mathbf {x}}_k) + {\bar{w}}_{f,k})^{2}+\epsilon _k}\right\| ^2 \\&\quad + 3\left\| G_k\right\| ^2 \left\| \frac{1}{(f({\mathbf {x}}_k))^{2}} -\frac{1}{(f({\mathbf {x}}_k))^{2} +\epsilon _k}\right\| ^2 \\&\le 3\left\| {\bar{w}}_{G,k}\right\| ^2 \frac{1}{((f({\mathbf {x}}_k))^{2}+\epsilon _k)^2}\\&\quad +3\left\| G_k+{\bar{w}}_{G,k}\right\| ^2 \left\| \frac{(2f({\mathbf {x}}_k)+{\bar{w}}_{f,k}) {\bar{w}}_{f,k}}{((f({\mathbf {x}}_k))^{2}+\epsilon _k)((f({\mathbf {x}}_k) + {\bar{w}}_{f,k})^{2} +\epsilon _k)}\right\| ^2 \\&\quad + 3\left\| G_k\right\| ^2 \left\| \frac{\epsilon _k}{(f({\mathbf {x}}_k))^{2}((f({\mathbf {x}}_k))^{2} +\epsilon _k)}\right\| ^2 \\&\le 3\left\| {\bar{w}}_{G,k}\right\| ^2 \frac{1}{\epsilon _f^{{4}}}+{3}\left\| G_k+{\bar{w}}_{G,k}\right\| ^2 \left\| \frac{(2f({\mathbf {x}}_k)+{\bar{w}}_{f,k})}{\epsilon _f^{2} \epsilon _k}\right\| ^2\Vert {\bar{w}}_{f,k}\Vert ^2 + 3\left\| G_k\right\| ^2 \left( \frac{\epsilon _k^2}{\epsilon _f^{{8}}}\right) \\&\le 3\left\| {\bar{w}}_{G,k}\right\| ^2 \frac{1}{\epsilon _f^4}+{3} \left\| G_k+{\bar{w}}_{G,k}\right\| ^2 \frac{(8f^2({\mathbf {x}}_k)\Vert {\bar{w}}_{f,k}\Vert ^2+2\Vert {\bar{w}}_{f,k}\Vert ^4)}{\epsilon _f^{4} \epsilon ^2_k} + \left\| G_k\right\| ^2 \left( \frac{\epsilon _k^2}{\epsilon _f^{8}}\right) , \end{aligned}$$

where \(f({\mathbf {x}}_k) \ge \epsilon _f\) for every \({\mathbf {x}}_k \in \mathcal {X}.\) Taking conditional expectations and recalling the independence of \({\bar{w}}_{f,k}\) and \({\bar{w}}_{G,k}\) conditional on \(\mathcal {F}_k\), the following bound emerges.

$$\begin{aligned} {\mathbb {E}}&\left[ \Vert {\bar{w}}_k\Vert ^2 \, \bigg | \, \mathcal {F}_k \, \right] \le 3{\mathbb {E}}\left[ \left\| {\bar{w}}_{G,k}\right\| ^2 \, \bigg | \, \mathcal {F}_k\, \right] \frac{1}{\epsilon _f^2} \\&+{3}{\mathbb {E}}\left[ \left\| G_k+{\bar{w}}_{G,k}\right\| ^2 \frac{(8f^2({\mathbf {x}}_k)\Vert {\bar{w}}_{f,k}\Vert ^2+2\Vert {\bar{w}}_{f,k}\Vert ^4)}{\epsilon _f^{{4}} \epsilon _k^2}\, \bigg | \, \mathcal {F}_k \right] + 3{\mathbb {E}}\left[ \left\| G_k\right\| ^2 \left( \frac{\epsilon _k^2}{\epsilon _f^{{8}}}\right) \, \bigg | \, \mathcal {F}_k \right] \\&\le 3\frac{\nu _G^2}{\epsilon _f^2N_k} +{3}{\mathbb {E}}\left[ \left\| G_k+{\bar{w}}_{G,k}\right\| ^2 \, \bigg | \, \mathcal {F}_k \right] {\mathbb {E}}\left[ \frac{(8f^2({\mathbf {x}}_k)\Vert {\bar{w}}_{f,k}\Vert ^2+2\Vert {\bar{w}}_{f,k}\Vert ^4)}{\epsilon _f^{{4}} \epsilon _k^2}\, \bigg | \, \mathcal {F}_k \right] \\&\quad + \left( \frac{3\epsilon _k^2M_G^2}{\epsilon _f^{{8}}}\right) \\&\le 3\frac{\nu _G^2}{\epsilon _f^2N_k}+{3} M_G^2 \frac{8f^2({\mathbf {x}}_k)\nu _f^2}{\epsilon _f^{4} \epsilon _k^2N_k}+{3}M_G^2{\mathbb {E}}\left[ \frac{\Vert {\bar{w}}_{f,k}\Vert ^4}{\epsilon _f^{4} \epsilon _k^2}\, \bigg | \, \mathcal {F}_k \right] + 3\left( \frac{\epsilon _k^2M_G^2}{\epsilon _f^{{8}}}\right) , \end{aligned}$$

where \(\Vert G_k\Vert ^2 = \Vert {\mathbb {E}}\left[ G({\mathbf {x}}_k,\xi ) \, \left| \, \mathcal {F}_k \right. \right] \Vert ^2 \le {\mathbb {E}}\left[ \Vert G({\mathbf {x}}_k,\xi )\Vert ^2 \, \left| \, \right. \mathcal {F}_k\right] \le M_G^2\) by Jensen’s inequality. From Prop. 2(b,c), \(| F({\mathbf {x}},\xi )| \le {M_F}\) for any \({\mathbf {x}}, \xi \), implying that

$$\begin{aligned} \Vert {\bar{w}}_{f,k}\Vert ^2 = \left\| \frac{\sum _{j=1}^{N_k} F({\mathbf {x}}_k,\xi _j)}{N_k} - f({\mathbf {x}}_k)\right\| ^2&\le 2 \left\| \frac{\sum _{j=1}^{N_k} F({\mathbf {x}}_k,\xi _j)}{N_k}\right\| ^2 + 2f^2({\mathbf {x}}_k) \\&\le 2({M_F}^2+1). \end{aligned}$$

Consequently, by recalling that \(\epsilon _k = 1/N_k^{1/4}\), the following holds a.s.

$$\begin{aligned} {\mathbb {E}}\left[ \Vert {\bar{w}}_k\Vert ^2 \, \bigg | \, \mathcal {F}_k \right]&\le \frac{3\nu _G^2}{\epsilon _f^2N_k}+{24}M_G^2 \frac{f^2({\mathbf {x}}_k)\nu _f^2}{\epsilon _f^{{4}} \epsilon _k^2N_k} +{3}{\mathbb {E}}\left[ \frac{\Vert {\bar{w}}_{f,k}\Vert ^4}{\epsilon _f^4 \epsilon _k^2}\, \bigg | \, {\mathbf {x}}_k \right] + \left( \frac{\epsilon _k^2M_G^2}{\epsilon _f^8}\right) \\&\le \frac{\nu _G^2}{\epsilon _f^2N_k}+{24}M_G^2 \frac{f^2({\mathbf {x}}_k)\nu _f^2}{\epsilon _f^{{4}} \epsilon _k^2N_k}+\frac{{6}({M_F}^2+1)M_G^2\nu _f^2}{\epsilon _f^4 \epsilon _k^2 N_k}\\&\le \frac{3\nu _G^2}{\epsilon _f^2\sqrt{N_k}}+M_G^2 \frac{{24}f^2({\mathbf {x}}_k)\nu _f^2}{\epsilon _f^{{4}} \sqrt{N_k}}+\frac{{6}(M_F^2+1)\nu _f^2}{\epsilon _f^{{4}} \sqrt{N_k}} + \left( \frac{3M_G^2}{\epsilon _f^{{8}}\sqrt{N_k}}\right) \\&\triangleq \frac{\nu ^2}{\sqrt{N_k}}, \text{ where } \nu ^2 \triangleq \frac{3\nu _G^2}{\epsilon _f^2}+M_G^2 \frac{{24}\nu _f^2}{\epsilon _f^{{4}} }+\frac{{6}({M_F}^2+1)\nu _f^2}{\epsilon _f^{{4}}} + \left( \frac{3M_G^2}{\epsilon _f^{{8}}}\right) . \end{aligned}$$

(Setting B) Since \( {\bar{w}}_k \triangleq {\frac{{-}(G_{k}+{\bar{w}}_{G,k})}{(f({\mathbf {x}}_k) + {\bar{w}}_{f,k})+\epsilon _k}+ \frac{G_{k}}{f({\mathbf {x}}_k)}}\) and

$$\begin{aligned} \Vert {\bar{w}}_{k}\Vert ^2&= \left\| \frac{{-}(G_{k}+{\bar{w}}_{G,{k}})}{(f({\mathbf {x}}_k) + {\bar{w}}_{f,{k}})+\epsilon _k}- \frac{{-}G_k}{(f({\mathbf {x}}_k))} \right\| ^2 \\&= \left\| \frac{{-}(G_{k}+{\bar{w}}_{G,{k}})}{(f({\mathbf {x}}_k) + {\bar{w}}_{f,k}) +\epsilon _k}-\frac{{-}(G_{k}+{\bar{w}}_{G,{k}})}{f({\mathbf {x}}_k)+\epsilon _k} +\frac{{-}(G_{k}+{\bar{w}}_{G,k})}{f({\mathbf {x}}_k)+\epsilon _k}\right. \\&\quad \left. - \frac{{-}G_{k}}{f({\mathbf {x}}_k)+\epsilon _k} + \frac{{-}G_{k}}{f({\mathbf {x}}_k)+\epsilon _k}- \frac{{-}G_{k}}{f({\mathbf {x}}_k)} \right\| ^2 \\&\le 3\left\| G_{k} - G_{k}+{\bar{w}}_{G,{k}}\right\| ^2 \frac{1}{{(f({\mathbf {x}}_k)+\epsilon _k)^2}} \\&\quad +3\left\| G_{k}+{\bar{w}}_{G,{k}}\right\| ^2 \left\| \frac{1}{f({\mathbf {x}}_k)+\epsilon _k} -\frac{1}{(f({\mathbf {x}}_k) + {\bar{w}}_{f,{k}})+\epsilon _k}\right\| ^2 \\&\quad + 3\left\| G_{k}\right\| ^2 \left\| \frac{1}{f({\mathbf {x}}_k)} -\frac{1}{f({\mathbf {x}}_k)+\epsilon _k} \right\| ^2 \\&\le 3\left\| {\bar{w}}_{G,{k}}\right\| ^2 \frac{1}{{(f({\mathbf {x}}_k)+\epsilon _k)^2}} +3\left\| G_{k}+ {\bar{w}}_{f,{k}}\right\| ^2 \\&\qquad \left\| \frac{{\bar{w}}_{f,k}}{(f({\mathbf {x}}_k) +\epsilon _k)(\underbrace{(f({\mathbf {x}}_k) + {\bar{w}}_{f,{k}})}_{{\ge 0, F({\mathbf {x}}_k,\xi ) \ge 0}} +\epsilon _k)}\right\| ^2 \\&\quad + 3\left\| G_{k}\right\| ^2 \left\| \frac{\epsilon _k}{f({\mathbf {x}}_k)(f({\mathbf {x}}_k)+\epsilon _k)} \right\| ^2 \\&\le 3\left\| {\bar{w}}_{G,{k}}\right\| ^2 \frac{1}{{\epsilon _f^2}} +3\left\| G_k+{\bar{w}}_{G,{k}}\right\| ^2 \left\| \frac{1}{\epsilon _f \epsilon _k}\right\| ^2\Vert {\bar{w}}_{f,{k}}\Vert ^2 + 3\left\| G_{k}\right\| ^2 \left( \frac{\epsilon _k^2}{\epsilon _f^{4}}\right) \\&\le 3\left\| {\bar{w}}_{G,{k}}\right\| ^2 \frac{1}{\epsilon _f^2}+3\left\| G_{k}+{\bar{w}}_{G,{k}}\right\| ^2 \frac{\Vert {\bar{w}}_{f,{k}}\Vert ^2}{\epsilon _f^{2} \epsilon ^2_k} + \left\| G_{k}\right\| ^2 \left( \frac{\epsilon _k^2}{\epsilon _f^{4}}\right) , \end{aligned}$$

where \(f({\mathbf {x}}_k) \ge \epsilon _f\) and for every \({\mathbf {x}}_k \in \mathcal {X}.\) Taking expectations conditioned on \(\mathcal {F}_k\) and recalling the independence of \({\bar{w}}_{f,k}\) and \({\bar{w}}_{G,k}\) conditional on \(\mathcal {F}_k\), we have the following bound.

$$\begin{aligned}&{\mathbb {E}}\left[ \Vert {\bar{w}}_{k}\Vert ^2 \, \bigg | \, \mathcal {F}_k\right] \\&\quad \le \left( 3{\mathbb {E}}\left[ \left\| {\bar{w}}_{G,{k}}\right\| ^2 \,\bigg | \,\mathcal {F}_k\right] \frac{1}{\epsilon _f^2}\right. \\&\qquad \left. +3{\mathbb {E}}\left[ \left\| G_{k}+{\bar{w}}_{G,{k}}\right\| ^2 \frac{\Vert {\bar{w}}_{f,{k}}\Vert ^2}{\epsilon _f^{2} \epsilon _k^2}\bigg | \mathcal {F}_k\right] + 3{\mathbb {E}}\left[ \left\| G_{k}\right\| ^2 \left( \frac{\epsilon _k^2}{\epsilon _f^{4}}\right) \,\bigg | \,\mathcal {F}_k\right] \right) \\&\quad \le \left( 3\frac{\nu _{G}^2}{\epsilon _f^2N_k} + 3{\mathbb {E}}\left[ \left\| G_{k}+{\bar{w}}_{G,{k}}\right\| ^2 \,\bigg | \,\mathcal {F}_k\right] {\mathbb {E}}\left[ \frac{\Vert {\bar{w}}_{f,{k}}\Vert ^2}{\epsilon _f^{2} \epsilon _k^2} \,\bigg | \,\mathcal {F}_k\right] + 3\left( \frac{\epsilon _k^2M_{G}^2}{\epsilon _f^{4}}\right) \right) \\&\quad \le \left( 3\frac{\nu _{G}^2}{\epsilon _f^2N_k}+3M_G^2 \frac{\nu _{f}^2}{\epsilon _f^{2} \epsilon _k^2N_k}+ 3\left( \frac{\epsilon _k^2M_{G}^2}{\epsilon _f^{4}}\right) \right) . \end{aligned}$$

By selecting \(\epsilon _k = 1/N_k^{1/4}\), we have that

$$\begin{aligned} {\mathbb {E}}\left[ \Vert {\bar{w}}_{k}\Vert ^2 \, \bigg | \, \mathcal {F}_k\right]&\le \frac{\nu ^2}{\sqrt{N_k}}, \text{ where } \nu ^2 \triangleq \left( 3\frac{\nu _{G}^2}{\epsilon _f^2}+3M_G^2 \frac{\nu _{f}^2}{\epsilon _f^{2} }+ 3\left( \frac{M_{G}^2}{\epsilon _f^{4}}\right) \right) . \end{aligned}$$

\(\square \)

Proof of Proposition 7:

(i) Using the update rule of \( {\mathbf {x}}_{k+1}\) and the fact that \({\mathbf {x}}^*=\varPi _{{\mathcal {X}}} [{\mathbf {x}}^*]\), for any \(d_k + {\bar{w}}_{k}\) where \(d_k \in \partial h({\mathbf {x}}_k)\) and \(k \ge 1\),

$$\begin{aligned} {1\over 2}\Vert {\mathbf {x}}_{k+1}-{\mathbf {x}}^*\Vert ^2&{ = } {1\over 2}\Vert \varPi _{{\mathcal {X}}} ({\mathbf {x}}_k-\gamma _k {(d_k + {\bar{w}}_k)})-\varPi _{\mathcal {X}}({\mathbf {x}}^*))\Vert ^2\\&\le {1\over 2}\Vert {\mathbf {x}}_k-\gamma _k {(d_k+{\bar{w}}_k)}-{\mathbf {x}}^*\Vert ^2\\&={1\over 2}\Vert {\mathbf {x}}_k-{\mathbf {x}}^*\Vert ^2+{1\over 2}\gamma _k^2\Vert {d_k + {\bar{w}}_k}\Vert ^2-\gamma _k({\mathbf {x}}_k-{\mathbf {x}}^*)^T({d_k}+{{\bar{w}}}_{k}), \end{aligned}$$

where in the second inequality, we employ the non-expansivity of projection operator. Now by using the convexity of h, we obtain:

$$\begin{aligned} 2\gamma _k (h({\mathbf {x}}_k)-h({\mathbf {x}}^*))&\le \left( \Vert {\mathbf {x}}_k-{\mathbf {x}}^*\Vert ^2-\Vert {\mathbf {x}}_{k+1}-{\mathbf {x}}^*\Vert ^2\right) +{\Vert {d_k+{\bar{w}}_k}\Vert ^2\gamma _k^2}\\&\quad -{2}\gamma _k{{\bar{w}}}_{k}^T({\mathbf {x}}_k-{\mathbf {x}}^*)\\&\le \left( \Vert {\mathbf {x}}_k-{\mathbf {x}}^*\Vert ^2-\Vert {\mathbf {x}}_{k+1}-{\mathbf {x}}^*\Vert ^2\right) +{\Vert {d_k+{\bar{w}}_k} \Vert ^2\gamma _k^2}\\&\quad + \gamma _k^2 \Vert {\mathbf {x}}_k-{\mathbf {x}}^*\Vert ^2 + \Vert {\bar{w}}_k\Vert ^2, \end{aligned}$$

where we use \(a^Tb\le {1\over 2}\Vert a\Vert ^2+{1\over 2}\Vert b\Vert ^2\). Now by summing from \(k = {\widehat{K}}\) to \(K-1\), where \({\widehat{K}}\) is an integer satisfying \(0 \le {\widehat{K}} < K-1\), we obtain the next inequality.

$$\begin{aligned} \sum _{k={\widehat{K}}}^{K-1} 2\gamma _k (h({\mathbf {x}}_k)-h({\mathbf {x}}^*))&\le {\Vert {\mathbf {x}}_{{{\hat{K}}}}-{\mathbf {x}}^*\Vert ^2}+\sum _{k={\widehat{K}}}^{K-1} \gamma _k^2({\Vert {d_k+{\bar{w}}_k}\Vert ^2}+ \Vert {\mathbf {x}}_k-{\mathbf {x}}^*\Vert ^2) +\Vert {\bar{w}}_k\Vert ^2. \end{aligned}$$

Dividing both sides by \(2\sum _{k={\widehat{K}}}^{K-1} \gamma _k\), taking expectations on both sides, and invoking Lemma 11 which leads to \({\mathbb {E}}[\Vert {{\bar{w}}}_{k}\mid {\mathcal {F}}_k\Vert ]^2\le {\nu ^2\over \sqrt{N_k}}\) and the bound of the subgradient, i.e., \({\mathbb {E}}[\Vert d_k + {\bar{w}}_k\Vert ^2]\le M_G^2\), we obtain the following bound.

$$\begin{aligned} {\mathbb {E}}&\left[ \frac{\sum _{k={\widehat{K}}}^{K-1} 2\gamma _k (h({\mathbf {x}}_k)-h({\mathbf {x}}^*))}{\sum _{k={\widehat{K}}}^{K-1} 2\gamma _k}\right] \nonumber \\&\le {\mathbb {E}}\left[ \frac{{\Vert {\mathbf {x}}_{{{\hat{K}}}}-{\mathbf {x}}^*\Vert ^2} +\sum _{k={\widehat{K}}}^{K-1} \gamma _k^2{\Vert {d_k+{\bar{w}}_k}\Vert ^2} + \sum _{k={\widehat{K}}}^{K-1} \gamma _k^2 \Vert {\mathbf {x}}_k-{\mathbf {x}}^*\Vert ^2 + \sum _{k={\widehat{K}}}^{K-1} \Vert {\bar{w}}_k\Vert ^2}{\sum _{k={\widehat{K}}}^{K-1}2 \gamma _k}\right] \end{aligned}$$
(43)
$$\begin{aligned}&\le \frac{{\mathbb {E}}[\Vert x_{{\widehat{K}}}-x^*\Vert ^2]}{\sum _{k={\widehat{K}}}^{K-1}2\gamma _k } + \frac{\sum _{k={\widehat{K}}}^{K-1}\gamma _k^2 (M_G^2 + B^2)}{\sum _{k={\widehat{K}}}^{K-1}2\gamma _k }+ \frac{\sum _{k={\widehat{K}}}^{K-1} \tfrac{\nu ^2}{\sqrt{N_k}}}{\sum _{k={\widehat{K}}}^{K-1}2\gamma _k}. \end{aligned}$$
(44)

By utilizing Jensen’s inequality, we obtain that

$$\begin{aligned} {\mathbb {E}}\left[ (h({\bar{x}}_{{\widehat{K}},K}-h({\mathbf {x}}^*))\right] \le {\mathbb {E}}\left[ \frac{\sum _{k={\widehat{K}}}^{K-1} 2\gamma _k (h({\mathbf {x}}_k)-h({\mathbf {x}}^*))}{\sum _{k={\widehat{K}}}^{K-1} 2\gamma _k}\right] , \end{aligned}$$

where \({\bar{x}}_{{\widehat{K}},K} \triangleq \tfrac{\sum _{k={\widehat{K}}}^{K-1} \gamma _k x_k}{\sum _{k={\widehat{K}}}^{K-1} \gamma _k},\) which when combined with (44) leads to (30). \(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bardakci, I.E., Jalilzadeh, A., Lagoa, C. et al. Probability maximization via Minkowski functionals: convex representations and tractable resolution. Math. Program. 199, 595–637 (2023). https://doi.org/10.1007/s10107-022-01859-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10107-022-01859-8

Keywords

Mathematics Subject Classification

Navigation