Probability maximization via Minkowski functionals: convex representations and tractable resolution

Bardakci, I. E.; Jalilzadeh, A.; Lagoa, C.; Shanbhag, U. V.

doi:10.1007/s10107-022-01859-8

Probability maximization via Minkowski functionals: convex representations and tractable resolution

Full Length Paper
Series A
Published: 08 September 2022

Volume 199, pages 595–637, (2023)
Cite this article

Mathematical Programming Submit manuscript

658 Accesses
3 Citations
Explore all metrics

Abstract

In this paper, we consider the maximizing of the probability ${\mathbb {P}}\left\{ \, \zeta \, \mid \, \zeta \, \in \, {\mathbf {K}}({\mathbf {x}}) \, \right\} $ over a closed and convex set ${\mathcal {X}}$, a special case of the chance-constrained optimization problem. Suppose ${\mathbf {K}}({\mathbf {x}}) \, \triangleq \, \left\{ \, \zeta \, \in \, {\mathcal {K}}\, \mid \, c({\mathbf {x}},\zeta ) \, \ge \, 0 \right\} $, and $\zeta $ is uniformly distributed on a convex and compact set ${\mathcal {K}}$ and $c({\mathbf {x}},\zeta )$ is defined as either $c({\mathbf {x}},\zeta )\, \triangleq \, 1-\left| \zeta ^T{\mathbf {x}}\right| ^m$ where $m\ge 0$ (Setting A) or $c({\mathbf {x}},\zeta ) \, \triangleq \, T{\mathbf {x}}\, - \, \zeta $ (Setting B). We show that in either setting, by leveraging recent findings in the context of non-Gaussian integrals of positively homogenous functions, ${\mathbb {P}}\left\{ \,\zeta \, \mid \, \zeta \, \in \, {\mathbf {K}}({\mathbf {x}}) \, \right\} $ can be expressed as the expectation of a suitably defined continuous function $F(\bullet ,\xi )$ with respect to an appropriately defined Gaussian density (or its variant), i.e. ${\mathbb {E}}_{{{\tilde{p}}}} \left[ \, F({\mathbf {x}},\xi )\, \right] $. Aided by a recent observation in convex analysis, we then develop a convex representation of the original problem requiring the minimization of $g\left( {\mathbb {E}}\left[ \, F(\bullet ,\xi )\, \right] \right) $ over ${\mathcal {X}}$, where g is an appropriately defined smooth convex function. Traditional stochastic approximation schemes cannot contend with the minimization of $g\left( {\mathbb {E}}\left[ F(\bullet ,\xi )\right] \right) $ over $\mathcal X$, since conditionally unbiased sampled gradients are unavailable. We then develop a regularized variance-reduced stochastic approximation (r-VRSA) scheme that obviates the need for such unbiasedness by combining iterative regularization with variance-reduction. Notably, (r-VRSA) is characterized by almost-sure convergence guarantees, a convergence rate of $\mathcal {O}(1/k^{1/2-a})$ in expected sub-optimality where $a > 0$, and a sample complexity of $\mathcal {O}(1/\epsilon ^{6+\delta })$ where $\delta > 0$. To the best of our knowledge, this may be the first such scheme for probability maximization problems with convergence and rate guarantees. Preliminary numerics on a portfolio selection problem (Setting A) and a set-covering problem (Setting B) suggest that the scheme competes well with naive mini-batch SA schemes as well as integer programming approximation methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Bernstein–Jackson Inequalities on Gaussian Hilbert Spaces

Article Open access 12 September 2023

Bernstein–Kantorovich operators, approximation and shape preserving properties

Article Open access 11 May 2024

Subdifferentials and Coderivatives of Efficient Point Multifunctions in Parametric Convex Vector Optimization

Article 14 May 2024

References

van Ackooij, W.: Eventual convexity of chance constrained feasible sets. Optimization 64(5), 1263–1284 (2015)
Article MathSciNet MATH Google Scholar
van Ackooij, W.: A discussion of probability functions and constraints from a variational perspective. Set-Valued Var. Anal. 28(4), 585–609 (2020). https://doi.org/10.1007/s11228-020-00552-2
Article MathSciNet MATH Google Scholar
van Ackooij, W., Aleksovska, I., Munoz-Zuniga, M.: (Sub-)differentiability of probability functions with elliptical distributions. Set-Valued Var. Anal. 26(4), 887–910 (2018). https://doi.org/10.1007/s11228-017-0454-3
Article MathSciNet MATH Google Scholar
van Ackooij, W., Berge, V., de Oliveira, W., Sagastizábal, C.: Probabilistic optimization via approximate $p$-efficient points and bundle methods. Comput. Oper. Res. 77, 177–193 (2017). https://doi.org/10.1016/j.cor.2016.08.002
Article MathSciNet MATH Google Scholar
van Ackooij, W., Demassey, S., Javal, P., Morais, H., de Oliveira, W., Swaminathan, B.: A bundle method for nonsmooth DC programming with application to chance-constrained problems. Comput. Optim. Appl. 78(2), 451–490 (2021). https://doi.org/10.1007/s10589-020-00241-8
Article MathSciNet MATH Google Scholar
van Ackooij, W., Henrion, R.: Gradient formulae for nonlinear probabilistic constraints with Gaussian and Gaussian-like distributions. SIAM J. Optim. 24(4), 1864–1889 (2014). https://doi.org/10.1137/130922689
Article MathSciNet MATH Google Scholar
van Ackooij, W., Henrion, R.: (Sub-)gradient formulae for probability functions of random inequality systems under Gaussian distribution. SIAM/ASA J. Uncertain. Quantif. 5(1), 63–87 (2017). https://doi.org/10.1137/16M1061308
Article MathSciNet MATH Google Scholar
van Ackooij, W., Henrion, R., Möller, A., Zorgati, R.: Joint chance constrained programming for hydro reservoir management. Optim. Eng. 15(2), 509–531 (2014). https://doi.org/10.1007/s11081-013-9236-4
Article MathSciNet MATH Google Scholar
van Ackooij, W., Pérez-Aros, P.: Gradient formulae for nonlinear probabilistic constraints with non-convex quadratic forms. J. Optim. Theory Appl. 185(1), 239–269 (2020). https://doi.org/10.1007/s10957-020-01634-9
Article MathSciNet MATH Google Scholar
van Ackooij, W., Sagastizábal, C.: Constrained bundle methods for upper inexact oracles with application to joint chance constrained energy problems. SIAM J. Optim. 24(2), 733–765 (2014). https://doi.org/10.1137/120903099
Article MathSciNet MATH Google Scholar
Ahmed, S., Luedtke, J., Song, Y., Xie, W.: Nonanticipative duality, relaxations, and formulations for chance-constrained stochastic programs. Math. Program. 162(1–2, Ser. A), 51–81 (2017)
Article MathSciNet MATH Google Scholar
Balasubramanian, K., Ghadimi, S., Nguyen, A.: Stochastic multi-level composition optimization algorithms with level-independent convergence rates. arXiv preprint arXiv:2008.10526 (2020)
Bardakci, I., Lagoa, C.M.: Distributionally robust portfolio optimization. In: 2019 IEEE 58th Conference on Decision and Control (CDC), pp. 1526–1531. IEEE (2019)
Bardakci, I.E., Lagoa, C., Shanbhag, U.V.: Probability maximization with random linear inequalities: Alternative formulations and stochastic approximation schemes. In: 2018 Annual American Control Conference, ACC 2018, Milwaukee, WI, USA, June 27-29, 2018, pp. 1396–1401. IEEE (2018)
Bienstock, D., Chertkov, M., Harnett, S.: Chance-constrained optimal power flow: Risk-aware network control under uncertainty. SIAM Rev. 56(3), 461–495 (2014)
Article MathSciNet MATH Google Scholar
Bobkov, S.G.: Convex bodies and norms associated to convex measures. Probab. Theory Relat. Fields 147(1–2), 303–332 (2010)
Article MathSciNet MATH Google Scholar
Brascamp, H.J., Lieb, E.H.: On extensions of the Brunn-Minkowski and Prékopa-Leindler theorems, including inequalities for log concave functions, and with an application to the diffusion equation. J. Functional Analysis 22(4), 366–389 (1976). https://doi.org/10.1016/0022-1236(76)90004-5
Article MathSciNet MATH Google Scholar
Burke, J.V., Chen, X., Sun, H.: The subdifferential of measurable composite max integrands and smoothing approximation. Math. Program. 181(2, Ser. B), 229–264 (2020). https://doi.org/10.1007/s10107-019-01441-9
Article MathSciNet MATH Google Scholar
Byrd, R.H., Chin, G.M., Nocedal, J., Wu, Y.: Sample size selection in optimization methods for machine learning. Math. Program. 134(1), 127–155 (2012)
Article MathSciNet MATH Google Scholar
Campi, M.C., Garatti, S.: A sampling-and-discarding approach to chance-constrained optimization: feasibility and optimality. J. Optim. Theory Appl. 148(2), 257–280 (2011)
Article MathSciNet MATH Google Scholar
Charnes, A., Cooper, W.W.: Chance-constrained programming. Management Sci. 6, 73–79 (1959/1960)
Charnes, A., Cooper, W.W., Symonds, G.H.: Cost horizons and certainty equivalents: An approach to stochastic programming of heating oil. Management Science 4(3), 235–263 (1958). https://EconPapers.repec.org/RePEc:inm:ormnsc:v:4:y:1958:i:3:p:235-263
Chen, L.: An approximation-based approach for chance-constrained vehicle routing and air traffic control problems. In: Large scale optimization in supply chains and smart manufacturing, Springer Optim. Appl., vol. 149, pp. 183–239. Springer, Cham (2019)
Chen, T., Sun, Y., Yin, W.: Solving stochastic compositional optimization is nearly as easy as solving stochastic optimization. IEEE Trans. Signal Process. 69, 4937–4948 (2021)
Article MathSciNet MATH Google Scholar
Chen, W., Sim, M., Sun, J., Teo, C.P.: From cvar to uncertainty set: Implications in joint chance-constrained optimization. Oper. Res. 58(2), 470–485 (2010)
Article MathSciNet MATH Google Scholar
Cheng, J., Chen, R.L.Y., Najm, H.N., Pinar, A., Safta, C., Watson, J.P.: Chance-constrained economic dispatch with renewable energy and storage. Comput. Optim. Appl. 70(2), 479–502 (2018). https://doi.org/10.1007/s10589-018-0006-2
Article MathSciNet MATH Google Scholar
Clarke, F.H.: Optimization and nonsmooth analysis, Classics in Applied Mathematics, vol. 5, second edn. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA (1990). https://doi.org/10.1137/1.9781611971309
Cui, Y., Liu, J., Pang, J.S.: Nonconvex and nonsmooth approaches for affine chance constrained stochastic programs. Set-Valued Variat. Anal. 30, 1149–1211 (2022)
Article MathSciNet MATH Google Scholar
Curtis, F.E., Wächter, A., Zavala, V.M.: A sequential algorithm for solving nonlinear optimization problems with chance constraints. SIAM J. Optim. 28(1), 930–958 (2018)
Article MathSciNet MATH Google Scholar
Ermoliev, Y.: Methods of Stochastic Programming. Monographs in Optimization and OR, Nauka, Moscow (1976)
Fiacco, A.V., McCormick, G.P.: The sequential maximization technique $({{\rm SUMT}})$ without parameters. Operations Res. 15, 820–827 (1967). https://doi.org/10.1287/opre.15.5.820
Article MathSciNet MATH Google Scholar
Fiacco, A.V., McCormick, G.P.: Nonlinear programming: Sequential unconstrained minimization techniques. John Wiley and Sons Inc, New York-London-Sydney (1968)
MATH Google Scholar
Ghadimi, S., Lan, G.: Accelerated gradient methods for nonconvex nonlinear and stochastic programming. Math. Program. 156(1–2), 59–99 (2016)
Article MathSciNet MATH Google Scholar
Ghadimi, S., Ruszczynski, A., Wang, M.: A single timescale stochastic approximation method for nested stochastic optimization. SIAM J. Optim. 30(1), 960–979 (2020)
Article MathSciNet MATH Google Scholar
Gicquel, C., Cheng, J.: A joint chance-constrained programming approach for the single-item capacitated lot-sizing problem with stochastic demand. Ann. Oper. Res. 264(1–2), 123–155 (2018). https://doi.org/10.1007/s10479-017-2662-5
Article MathSciNet MATH Google Scholar
Göttlich, S., Kolb, O., Lux, K.: Chance-constrained optimal inflow control in hyperbolic supply systems with uncertain demand. Optimal Control Appl. Methods 42(2), 566–589 (2021). https://doi.org/10.1002/oca.2689
Article MathSciNet MATH Google Scholar
Guo, G., Zephyr, L., Morillo, J., Wang, Z., Anderson, C.L.: Chance constrained unit commitment approximation under stochastic wind energy. Comput. Oper. Res. 134, Paper No. 105398, 13 (2021). https://doi.org/10.1016/j.cor.2021.105398
Guo, S., Xu, H., Zhang, L.: Convergence analysis for mathematical programs with distributionally robust chance constraint. SIAM J. Optim. 27(2), 784–816 (2017). https://doi.org/10.1137/15M1036592
Article MathSciNet MATH Google Scholar
Gurobi Optimization, LLC.: Gurobi Optimizer Reference Manual (2022). https://www.gurobi.com
Henrion, R.: Optimierungsprobleme mit wahrscheinlichkeitsrestriktionen: Modelle, struktur, numerik. Lecture notes p. 43 (2010)
Hong, L.J., Yang, Y., Zhang, L.: Sequential convex approximations to joint chance constrained programs: A monte carlo approach. Oper. Res. 59(3), 617–630 (2011)
Article MathSciNet MATH Google Scholar
Jalilzadeh, A., Shanbhag, U.V., Blanchet, J.H., Glynn, P.W.: Optimal smoothed variable sample-size accelerated proximal methods for structured nonsmooth stochastic convex programs. arXiv preprint arXiv:1803.00718 (2018)
Lagoa, C.M., Li, X., Sznaier, M.: Probabilistically constrained linear programs and risk-adjusted controller design. SIAM J. Optim. 15(3), 938–951 (2005)
Article MathSciNet MATH Google Scholar
Lasserre, J.B.: Level sets and nongaussian integrals of positively homogeneous functions. IGTR 17(1), 1540001 (2015)
MathSciNet MATH Google Scholar
Lei, J., Shanbhag, U.V.: Asynchronous variance-reduced block schemes for composite non-convex stochastic optimization: block-specific steplengths and adapted batch-sizes. Optimization Methods and Software 0(0), 1–31 (2020)
MATH Google Scholar
Lian, X., Wang, M., Liu, J.: Finite-sum composition optimization via variance reduced gradient descent. In: Artificial Intelligence and Statistics, pp. 1159–1167. PMLR (2017)
Lieb, E., Loss, M.: Analysis. Crm Proceedings & Lecture Notes. American Mathematical Society (2001). https://books.google.com/books?id=Eb_7oRorXJgC
Luedtke, J., Ahmed, S.: A sample approximation approach for optimization with probabilistic constraints. SIAM J. Optim. 19(2), 674–699 (2008)
Article MathSciNet MATH Google Scholar
Markowitz, H.: Portfolio selection. J. Financ. 7(1), 77–91 (1952)
Google Scholar
Miller, B.L., Wagner, H.M.: Chance constrained programming with joint constraints. Oper. Res. 13(6), 930–945 (1965)
Article MATH Google Scholar
Morozov, A., Shakirov, S.: Introduction to integral discriminants. J. High Energy Phys. 2009(12), 002 (2009)
Article MathSciNet Google Scholar
Nemirovski, A., Juditsky, A., Lan, G., Shapiro, A.: Robust stochastic approximation approach to stochastic programming. SIAM J. Optim. 19(4), 1574–1609 (2009)
Article MathSciNet MATH Google Scholar
Nemirovski, A., Shapiro, A.: Convex approximations of chance constrained programs. SIAM J. Optim. 17(4), 969–996 (2006)
Article MathSciNet MATH Google Scholar
Norkin, V.I.: The analysis and optimization of probability functions (1993)
Pagnoncelli, B.K., Ahmed, S., Shapiro, A.: Sample average approximation method for chance constrained programming: theory and applications. J. Optim. Theory Appl. 142(2), 399–416 (2009). https://doi.org/10.1007/s10957-009-9523-6
Article MathSciNet MATH Google Scholar
Pagnoncelli, B.K., Ahmed, S., Shapiro, A.: Sample average approximation method for chance constrained programming: theory and applications. J. Optim. Theory Appl. 142(2), 399–416 (2009)
Article MathSciNet MATH Google Scholar
Peña-Ordieres, A., Luedtke, J.R., Wächter, A.: Solving chance-constrained problems via a smooth sample-based nonlinear approximation. arXiv:1905.07377 (2019)
Pflug, G.C., Weisshaupt, H.: Probability gradient estimation by set-valued calculus and applications in network design. SIAM J. Optim. 15(3), 898–914 (2005). https://doi.org/10.1137/S1052623403431639
Article MathSciNet MATH Google Scholar
Polyak, B.T.: New stochastic approximation type procedures. Automat. i Telemekh 7(98–107), 2 (1990)
Google Scholar
Polyak, B.T., Juditsky, A.B.: Acceleration of stochastic approximation by averaging. SIAM J. Control. Optim. 30(4), 838–855 (1992)
Article MathSciNet MATH Google Scholar
Prékopa, A.: A class of stochastic programming decision problems. Math. Operationsforsch. Statist. 3(5), 349–354 (1972). https://doi.org/10.1080/02331937208842107
Article MathSciNet MATH Google Scholar
Prékopa, A.: On logarithmic concave measures and functions. Acta Scientiarum Mathematicarum 34, 335–343 (1973)
MathSciNet MATH Google Scholar
Prékopa, A.: Probabilistic programming. In: Stochastic programming, Handbooks Oper. Res. Management Sci., vol. 10, pp. 267–351. Elsevier Sci. B. V., Amsterdam, Netherlands (2003). https://doi.org/10.1016/S0927-0507(03)10005-9
Prékopa, A.: Stochastic programming, vol. 324. Springer Science & Business Media (2013)
Prékopa, A., Szántai, T.: Flood control reservoir system design using stochastic programming. In: Mathematical programming in use, pp. 138–151. Springer (1978)
Robbins, H., Monro, S.: A stochastic approximation method. The annals of mathematical statistics pp. 400–407 (1951)
Royset, J.O., Polak, E.: Extensions of stochastic optimization results to problems with system failure probability functions. J. Optim. Theory Appl. 133(1), 1–18 (2007). https://doi.org/10.1007/s10957-007-9178-0
Article MathSciNet MATH Google Scholar
Scholtes, S.: Introduction to piecewise differentiable equations. Springer Science & Business Media, New York (2012)
Book MATH Google Scholar
Shanbhag, U.V., Blanchet, J.H.: Budget-constrained stochastic approximation. In: Proceedings of the 2015 Winter Simulation Conference, Huntington Beach, CA, USA, December 6-9, 2015, pp. 368–379 (2015)
Shapiro, A., Dentcheva, D., Ruszczyński, A.: Lectures on stochastic programming: modeling and theory. SIAM (2009)
Sun, Y., Aw, G., Loxton, R., Teo, K.L.: Chance-constrained optimization for pension fund portfolios in the presence of default risk. European J. Oper. Res. 256(1), 205–214 (2017). https://doi.org/10.1016/j.ejor.2016.06.019
Article MathSciNet MATH Google Scholar
Uryasev, S.: Derivatives of probability functions and integrals over sets given by inequalities. pp. 197–223 (1994). https://doi.org/10.1016/0377-0427(94)90388-3. Stochastic programming: stability, numerical methods and applications (Gosen, 1992)
Uryasev, S.: Derivatives of probability functions and some applications. pp. 287–311 (1995). https://doi.org/10.1007/BF02031712. Stochastic programming (Udine, 1992)
Wang, M., Fang, E.X., Liu, H.: Stochastic compositional gradient descent: Algorithms for minimizing compositions of expected-value functions. Math. Program. 161(1–2), 419–449 (2017)
Article MathSciNet MATH Google Scholar
Wang, M., Liu, J., Fang, E.X.: Accelerating stochastic composition optimization. The Journal of Machine Learning Research 18(1), 3721–3743 (2017)
MathSciNet MATH Google Scholar
Xie, Y., Shanbhag, U.V.: SI-ADMM: A stochastic inexact ADMM framework for stochastic convex programs. IEEE Trans. Autom. Control 65(6), 2355–2370 (2020)
Article MathSciNet MATH Google Scholar
Yadollahi, E., Aghezzaf, E.H., Raa, B.: Managing inventory and service levels in a safety stock-based inventory routing system with stochastic retailer demands. Appl. Stoch. Models Bus. Ind. 33(4), 369–381 (2017). https://doi.org/10.1002/asmb.2241
Article MathSciNet Google Scholar
Yang, S., Wang, M., Fang, E.X.: Multilevel stochastic gradient methods for nested composition optimization. SIAM J. Optim. 29(1), 616–659 (2019)
Article MathSciNet MATH Google Scholar

Download references

Acknowledgements

The authors would like to acknowledge support from NSF CMMI-1538605, EPCN-1808266, DOE ARPA-E award DE-AR0001076, NIH R01-HL142732, and the Gary and Sheila Bello chair funds. Preliminary efforts at studying Setting A were carried out in [14]

Author information

Authors and Affiliations

Bartin University, Bartin, Turkey
I. E. Bardakci
University of Arizona, Tucson, Arizona, United States
A. Jalilzadeh
Pennsylvania State University, University Park, United States
C. Lagoa & U. V. Shanbhag

Authors

I. E. Bardakci
View author publications
You can also search for this author in PubMed Google Scholar
A. Jalilzadeh
View author publications
You can also search for this author in PubMed Google Scholar
C. Lagoa
View author publications
You can also search for this author in PubMed Google Scholar
U. V. Shanbhag
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to U. V. Shanbhag.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Proof of Theorem 2:

(a)
When considering uniform distributions over a compact and convex set $\mathcal {K}$, the density is constant in this set and zero outside the set. It can then be concluded that $\zeta $ has a log-concave density. Furthermore, $\zeta $ has a symmetric density about the origin since $\mathcal {K}$ is a symmetric set about the origin. Hence by Lemma 6.2 in [16], h is convex where $h({\mathbf {x}})\triangleq 1/f({\mathbf {x}})$. $\square $
(b)
Since (11) is a convex program, any solution ${\mathbf {x}}^*$ satisfies ${ h({\mathbf {x}}^*) \le h({\mathbf {x}}), \ \forall \mathbf {x} \in \mathcal {X}.}$ From the positivity of f over ${\mathcal {X}}$, $ \frac{1}{f({\mathbf {x}}^*)} \le \frac{1}{f({\mathbf {x}})}$ for every ${\mathbf {x}}\in \mathcal {X}$ implying that $f({\mathbf {x}}^*) \ge f({\mathbf {x}})$ for every ${\mathbf {x}}\in \mathcal {X}.$ Consequently, ${\mathbf {x}}^*$ is a global maximizer of (11). $\square $

Proof of Lemma 3:

We prove this result by showing the unimodality of f on ${\mathbb {R}}_+$ where $f(u) = u^c e^{-u}$, implying that $f'(u) = cu^{c-1}e^{-u} - u^ce^{-u} = 0$ if $u = c.$ Furthermore, $f'(u) > 0$ when $u < c$ and $f'(u) < 0$ when $u > c$. Finally, $f(0) = 0$. It follows that $u^* = c$ is a maximizer of $u^ce^{-u}$ on $[0,\infty )$ where $f(c) = \frac{c^c}{e^{c}}$. $\square $

Proof of Proposition 2:

Recall the definition of $F({\mathbf {x}},\xi )$ from the statement of Lemma 4. We prove (a) by considering two cases. Case (i): $\xi \in \varXi _1({\mathbf {x}}) \cup \varXi _0({\mathbf {x}}).$ It follows that

$$\begin{aligned} \left| F({\mathbf {x}},\xi )\right| ^2 = \mathcal {C}^2_{{\mathcal {K}}} \left( (2\pi )^{{n}} e^{-2\mid \xi ^T{\mathbf {x}}\mid ^2+\Vert \xi \Vert _{{\mathcal {K}}}^2} \right)&\le \mathcal {C}^2_{{\mathcal {K}}} \left( (2\pi )^{{n}} e^{-2\mid \xi ^T{\mathbf {x}}\mid ^2+\mid \xi ^T{\mathbf {x}}\mid ^2} \right) \le \mathcal {C}^2_{{\mathcal {K}}} (2\pi )^n. \end{aligned}$$

Case (ii): $\xi \in \varXi _2({\mathbf {x}}).$ Proceeding similarly, we obtain that

$$\begin{aligned} \left| F({\mathbf {x}},\xi )\right| ^2&\le \mathcal {C}^2_{{\mathcal {K}}} \left( (2 \pi )^{{n}} e^{{-\Vert \xi \Vert _{{\mathcal {K}}}^2}} \right) \le \mathcal {C}^2_{{\mathcal {K}}} (2\pi )^n. \end{aligned}$$

Consequently, $\left| F({\mathbf {x}},\xi )\right| ^2 \le \mathcal {C}^2_{{\mathcal {K}}} (2\pi )^n$ for every $\xi \in {\mathbb {R}}^n$.

(b) We observe that $\partial F({\mathbf {x}},\xi )$ is defined as follows.

$$\begin{aligned} \partial F({\mathbf {x}},\xi ) = {\left\{ \begin{array}{ll} \left( \mathcal {C}_{{\mathcal {K}}}(2\pi )^{n/2} (-2\xi \xi ^T{\mathbf {x}}) e^{-|\xi ^{T} {\mathbf {x}}|^{2}+\frac{\Vert \xi \Vert _{{\mathcal {K}}}^2}{2}}\right) , &{} \xi \in \varXi _1({\mathbf {x}}) \triangleq \left\{ \xi \mid |\xi ^{T} {\mathbf {x}}|^{{2}} > \Vert \xi \Vert ^{2}_{\mathcal {K}}\right\} \\ \left( -\mathcal {C}_{{\mathcal {K}}} (2\pi )^{n/2} e^{-\max \{|\xi ^{T} {\mathbf {x}}|^{2}, {\Vert \xi \Vert ^{2}_{\mathcal {K}}}\}+\frac{\Vert \xi \Vert _{{\mathcal {K}}}^2}{2}}\right) \left[ 0,2\xi (\xi ^T{\mathbf {x}})\right] , &{} \xi \in \varXi _0({\mathbf {x}}) \triangleq \left\{ \xi \mid |\xi ^{T} {\mathbf {x}}|^{2} = \Vert \xi \Vert ^{2}_{\mathcal {K}}\right\} \\ \mathbf{0}. &{} \xi \in \varXi _2({\mathbf {x}}) \triangleq \left\{ \xi \mid |\xi ^{T} {\mathbf {x}}|^{2} < \Vert \xi \Vert ^{2}_{\mathcal {K}}\right\} \end{array}\right. } \end{aligned}$$

Consequently, it follows that ${\mathbb {E}}_{{\tilde{p}}} \left[ \, \Vert G({\mathbf {x}},\xi )\Vert ^2 \, \right] $ is bounded as follows.

$$\begin{aligned}&{\mathbb {E}}\left[ \, \Vert G({\mathbf {x}},\xi )\Vert ^2\, \right] = \int _{\varXi } \Vert G({\mathbf {x}},\xi )\Vert ^2 {\tilde{p}}(\xi ) d\xi \nonumber \\&= \int _{\varXi _1({\mathbf {x}})} \Vert G({\mathbf {x}},\xi )\Vert ^2 {\tilde{p}}(\xi ) d\xi + \int _{\varXi _2({\mathbf {x}})} \Vert \underbrace{G({\mathbf {x}},\xi )}_{ \ = \ \mathbf{0}}\Vert ^2 {\tilde{p}}(\xi ) d\xi \nonumber \\&\quad + \int _{\varXi _0({\mathbf {x}})} \Vert G({\mathbf {x}},\xi )\Vert ^2 {\tilde{p}}(\xi ) d\xi \nonumber \\&= \int _{\varXi _1({\mathbf {x}})} \Vert G({\mathbf {x}},\xi )\Vert ^2 {\tilde{p}}(\xi ) d\xi , \end{aligned}$$

(33)

where the last equality follows from observing that $G({\mathbf {x}},\xi ) = 0$ for $\xi \in \varXi _2({\mathbf {x}})$ and the integral in (33) is zero because $\varXi _0({\mathbf {x}})$ is a measure zero set. It follows that ${\mathbb {E}}\left[ \, \Vert G({\mathbf {x}},\xi )\Vert ^2\, \right] $ can be bounded as follows:

$$\begin{aligned}&{\mathbb {E}}\left[ \, \Vert G({\mathbf {x}},\xi )\Vert ^2\right] \nonumber \\&\quad = \int _{\varXi _1({\mathbf {x}})}\left( \mathcal {C}^2_{{\mathcal {K}}}(2\pi )^{n} (4\Vert \xi \Vert _2^2 (\xi ^T{\mathbf {x}})^2) e^{-2(\xi ^T{\mathbf {x}})^2 +{\Vert \xi \Vert ^2_{{\mathcal {K}}}}} \right) \frac{1}{D_{{\mathcal {K}}}}(2\pi )^{-\frac{n}{2}} e^{\frac{{\Vert \xi \Vert ^2_{{\mathcal {K}}}}}{2}} d \xi \end{aligned}$$

(34)

$$\begin{aligned}&\quad \le \int _{\varXi _1({\mathbf {x}})}\left( \mathcal {C}^2_{{\mathcal {K}}}(2\pi )^{n} (4\Vert \xi \Vert _2^2 (\xi ^T{\mathbf {x}})^2) e^{-(\xi ^T{\mathbf {x}})^2} \right) \frac{1}{D_{{\mathcal {K}}}}(2\pi )^{-\frac{n}{2}} e^{\frac{{\Vert \xi \Vert ^2_{{\mathcal {K}}}}}{2}} d\xi , \end{aligned}$$

(35)

where the inequality follows from $\xi \in \varXi _1({\mathbf {x}},u)$. Next, we consider the expression $(\xi ^T{\mathbf {x}})^2 e^{-(\xi ^T{\mathbf {x}})^2}$ or $ue^{-u}$. We note that by Lemma 3, $ue^{-u}$ is a unimodal function and $u^* = 1$ is a maximizer with value $e^{-1}$. Consequently, we have that

$$\begin{aligned} \max _{\{(\xi ^T{\mathbf {x}}) \mid \xi \in \varXi ({\mathbf {x}})\}} (\xi ^T{\mathbf {x}})^2 e^{-(\xi ^T{\mathbf {x}})^2} \le \max _{u \in {\mathbb {R}}_+} u e^{-u}\overset{\tiny \mathrm {Lemma}~3}{\le } \tfrac{1}{e}, \end{aligned}$$

implying that

$$\begin{aligned}&{\mathbb {E}}[\Vert G({\mathbf {x}},\xi )\Vert ^2] \le \int _{\varXi _1({\mathbf {x}})} \left( \mathcal {C}_{{\mathcal {K}}}^2(2\pi )^{n} (\Vert \xi \Vert _2^2 (\xi ^T{\mathbf {x}})^2) e^{-(\xi ^T{\mathbf {x}})^2} \right) \tfrac{1}{D_{{\mathcal {K}}}}(2\pi )^{-\tfrac{n}{2}} e^{-\tfrac{{\Vert \xi \Vert ^2_{{\mathcal {K}}}}}{2}} d\xi \\&\quad \le e^{-1} \mathcal {C}^2_{{\mathcal {K}}}(2\pi )^{n}\int _{\varXi _1({\mathbf {x}})} \Vert \xi \Vert _2^2 \tfrac{1}{D_{{\mathcal {K}}}}(2\pi )^{-\tfrac{n}{2}} e^{-\tfrac{{\Vert \xi \Vert ^2_{{\mathcal {K}}}}}{2}} d\xi \\&\quad \le e^{-1} \mathcal {C}^2_{{\mathcal {K}}}(2\pi )^{n}\int _{{\mathbb {R}}^n} \Vert \xi \Vert _2^2 \tfrac{1}{D_{{\mathcal {K}}}}(2\pi )^{-\tfrac{n}{2}} e^{-\tfrac{{\Vert \xi \Vert ^2_{{\mathcal {K}}}}}{2}} d\xi = e^{-1} \mathcal {C}^2_{{\mathcal {K}}}(2\pi )^{n}{\mathbb {E}}_{{{\tilde{p}}}} \left[ \Vert \xi \Vert _2^2\right] . \end{aligned}$$

$\square $

Proof of Proposition 3:

(a) Since $\Vert \xi \Vert _{{\mathcal {K}}}^2 = \Vert \xi \Vert _p^2$, it follows from Theorem 1 that

$$\begin{aligned} f({\mathbf {x}})&= \int _{{\mathbb {R}}^n} \left( \mathcal {C}(2\pi \sigma ^2)^{\tfrac{n}{2}} e^{-\max \{ \mid \xi ^T {\mathbf {x}}\mid ^2, {\Vert \xi \Vert _{p}^2}\} } \right) d\xi \\&= \int _{{\mathbb {R}}^n} \underbrace{\left( \mathcal {C}(2\pi \sigma ^2)^{\tfrac{n}{2}} e^{-\max \{ \mid \xi ^T {\mathbf {x}}\mid ^2, {\Vert \xi \Vert _{p}^2} \}+\tfrac{{\Vert \xi \Vert _{2}^2}}{2\sigma ^2} } \right) }_{\triangleq F({\mathbf {x}},\xi )} \underbrace{(2\pi \sigma ^2)^{-\tfrac{n}{2}} e^{-\tfrac{{\Vert \xi \Vert _{2}^2}}{2\sigma ^2}}}_{\triangleq {\tilde{p}}(\xi )} d\xi . \end{aligned}$$

(b) Omitted (similar to proof of Proposition 2(a).

(c) Next, we derive a bound on the second moment of $\Vert G({\mathbf {x}},\xi )\Vert $ akin to Prop. 2(b). We observe that $ \partial F({\mathbf {x}},\xi ) $ is defined as

$$\begin{aligned} \partial F({\mathbf {x}},\xi ) = {\left\{ \begin{array}{ll} \left( \mathcal {C}(2\pi \sigma ^2)^{n/2} (-2\xi \xi ^T{\mathbf {x}}) e^{-|\xi ^{T} {\mathbf {x}}|^{{2}}+\frac{{\Vert \xi \Vert _{2}^2}}{2\sigma ^2}}\right) , &{} \xi \in \varXi _1({\mathbf {x}}) \triangleq \left\{ \xi \mid |\xi ^{T} {\mathbf {x}}|^{{2}} > \Vert \xi \Vert ^{2}_{p} \right\} \\ \left( -\mathcal {C} (2\pi \sigma ^2)^{n/2} e^{-\max \{|\xi ^{T} {\mathbf {x}}|^{{2}}, {\Vert \xi \Vert ^{{2}}_{p}}\}+\frac{{\Vert \xi \Vert _{2}^2}}{2\sigma ^2}}\right) \left[ 0,2\xi (\xi ^T{\mathbf {x}})\right] , &{} \xi \in \varXi _0({\mathbf {x}}) \triangleq \left\{ \xi \mid |\xi ^{T} {\mathbf {x}}|^{{2}} = \Vert \xi \Vert ^{2}_{p}\right\} \\ \mathbf{0}. &{} \xi \in \varXi _2({\mathbf {x}}) \triangleq \left\{ \xi \mid |\xi ^{T} {\mathbf {x}}|^{{2}} < \Vert \xi \Vert ^{2}_{p}\right\} \end{array}\right. } \end{aligned}$$

Consequently, ${\mathbb {E}}\left[ \Vert G({\mathbf {x}},\xi )\Vert ^2\right] $ can be bounded as follows.

$$\begin{aligned}&{\mathbb {E}}\left[ \Vert G({\mathbf {x}},\xi )\Vert ^2\right] = \int _{\varXi } \Vert G({\mathbf {x}},\xi )\Vert ^2 {\tilde{p}}(\xi ) d\xi \nonumber \\&= \int _{\varXi _1({\mathbf {x}})} \Vert G({\mathbf {x}},\xi )\Vert ^2 {\tilde{p}}(\xi ) d\xi + \int _{\varXi _2({\mathbf {x}})} \Vert \underbrace{G({\mathbf {x}},\xi )}_{ \ = \ \mathbf{0}}\Vert ^2 {\tilde{p}}(\xi ) d\xi \nonumber \\&\quad + \int _{\varXi _0({\mathbf {x}})} \Vert G({\mathbf {x}},\xi )\Vert ^2 {\tilde{p}}(\xi ) d\xi \nonumber \\&= \int _{\varXi _1({\mathbf {x}})} \Vert G({\mathbf {x}},\xi )\Vert ^2 {\tilde{p}}(\xi ) d\xi , \end{aligned}$$

(36)

where the last equality follows from observing that $G({\mathbf {x}},\xi ) = 0$ for $\xi \in \varXi _2({\mathbf {x}})$ and the integral in (36) is zero because $\varXi _0({\mathbf {x}})$ is a measure zero set. It follows that

$$\begin{aligned}&\ {\mathbb {E}}[\Vert G({\mathbf {x}},\xi )\Vert ^2] = \int _{\varXi _1({\mathbf {x}})}\left( \mathcal {C}^2(2\pi \sigma ^2)^{n} (4\Vert \xi \Vert _2^2 (\xi ^T{\mathbf {x}})^2) e^{-2(\xi ^T{\mathbf {x}})^2 +{\tfrac{\Vert \xi \Vert ^2_{2}}{\sigma ^2}}} \right) (2\pi \sigma ^2)^{-\tfrac{n}{2}} e^{\tfrac{{-\Vert \xi \Vert ^2_{2}}}{2\sigma ^2}} d\xi \nonumber \\&\le \int _{\varXi _1({\mathbf {x}})}\left( \mathcal {C}^2(2\pi \sigma ^2)^{n} (4\Vert \xi \Vert _2^2 (\xi ^T{\mathbf {x}})^2) e^{-2(\xi ^T{\mathbf {x}})^2 +{\Vert \xi \Vert ^2_{p}}} \right) (2\pi \sigma ^2)^{-\tfrac{n}{2}} e^{\tfrac{{-\Vert \xi \Vert ^2_{2}}}{2\sigma ^2}} d\xi \end{aligned}$$

(37)

$$\begin{aligned}&\le \int _{\varXi _1({\mathbf {x}})}\left( \mathcal {C}^2(2\pi \sigma ^2)^{n} (4\Vert \xi \Vert _2^2 (\xi ^T{\mathbf {x}})^2) e^{-(\xi ^T{\mathbf {x}})^2} \right) (2\pi \sigma ^2)^{-\tfrac{n}{2}} e^{\tfrac{{-\Vert \xi \Vert ^2_{2}}}{2\sigma ^2}} d\xi , \end{aligned}$$

(38)

where (38) follows from $\xi \in \varXi _1({\mathbf {x}})$ and (37) follows from

$$\begin{aligned} \frac{\Vert \xi \Vert _2^2}{\sigma ^2} \le \Vert \xi \Vert _p^2, \text{ where } \sigma ^2 = {\left\{ \begin{array}{ll} n^{1/2-1/p}, &{} p \ge 2 \\ 1. &{} 1 \le p < 2 \end{array}\right. } \end{aligned}$$

We may then conclude that

$$\begin{aligned}&\ {\mathbb {E}}[\Vert G({\mathbf {x}},\xi )\Vert ^2] \le \int _{\varXi _1({\mathbf {x}})} \left( \mathcal {C}^2(2\pi \sigma ^2)^{n} (\Vert \xi \Vert _2^2 (\xi ^T{\mathbf {x}})^2) e^{-(\xi ^T{\mathbf {x}})^2} \right) (2\pi \sigma ^2)^{-\tfrac{n}{2}} e^{-\frac{{\Vert \xi \Vert ^2_{2}}}{2\sigma ^2}} d\xi \nonumber \\&\le e^{-1} \mathcal {C}^2(2\pi \sigma ^2)^{n}\int _{\varXi _1({\mathbf {x}})} \left( \Vert \xi \Vert ^2 \right) (2\pi \sigma ^2)^{-\frac{n}{2}} e^{-\tfrac{{\Vert \xi \Vert ^2_{2}}}{2\sigma ^2}} d\xi \nonumber \\&\le e^{-1} \mathcal {C}^2(2\pi \sigma ^2)^{n}\int _{{\mathbb {R}}^n} \left( \Vert \xi \Vert _2^2\right) (2\pi \sigma ^2)^{-\frac{n}{2}} e^{-\tfrac{{\Vert \xi \Vert ^2_{2}}}{2\sigma ^2}} d\xi = e^{-1} \mathcal {C}^2(2\pi \sigma ^2)^{n}{\mathbb {E}}\left[ \Vert \xi \Vert ^2\right] . \end{aligned}$$

(39)

where (39) follows from Lemma 3. $\square $

Proof of Lemma 6:

Suppose $({\mathbf {x}},{\mathbf {y}})$ is feasible with respect to (PM$_{A,\mathrm{ext}}^{2}$). Then ${\mathbf {x}}\in \mathcal {X}$ and is therefore feasible with (PM$_{A}^{{\mathcal {E}}}$). In addition,

$$\begin{aligned} f({\mathbf {x}})&\triangleq {\mathbb {P}}\left\{ \zeta \in {\mathbb {R}}^n \, \left| \, \zeta \in {\mathcal {K}}_{{\mathcal {E}}}, \left| \zeta ^{T} {\mathbf {x}}\right| \ \le 1 \right. \, \right\} ={\mathbb {P}} \left\{ \zeta \, \left| \, \zeta ^{T} U^{T} {\varSigma ^{-1}} U \zeta \le 1, \left| \zeta ^{T} {\mathbf {x}}\right| \le 1 \right. \right\} \\&= {\mathbb {P}}\left\{ \zeta \in {\mathbb {R}}^n \, \left| \, \Vert {\varSigma ^{-1/2}} U\zeta \Vert ^2_2 \le 1, \left| \zeta ^{T} {\mathbf {x}}\right| \ \le 1 \right. \right\} \\&= {\mathbb {P}} \left\{ U^{T} {\varSigma ^{1/2}} {\eta } \in {\mathbb {R}}^n\, \left| \, \Vert {\eta }\Vert ^2_2 \le 1, \left| (U^{T} {\varSigma ^{1/2} \eta })^{T} {\mathbf {x}}\right| \ \le 1 \right. \right\} \\&= {\mathbb {P}} \left\{ U^{T} {\varSigma ^{1/2} \eta } \in {\mathbb {R}}^n\, \left| \, \Vert {\eta }\Vert ^2_2 \le 1, \left| {\eta ^{T} \varSigma ^{1/2}} U {\mathbf {x}}\right| \ \le 1 \right. \right\} \\&= {\mathbb {P}} \left\{ {\eta } \in {\mathbb {R}}^n \, \left| \, {\eta } \in {\mathcal {K}}_{2} \, \left| \, {\eta }^{T} \varSigma ^{1/2} U {\mathbf {x}}\right| \ \le 1 \right. \right\} \triangleq g({\mathbf {x}}). \end{aligned}$$

$\square $

Proof of Proposition 6:

(a) The result follows by a transformation argument. We define a new variable ${ {\tilde{\zeta }} \in {\tilde{{\mathcal {K}}}}}$ such that ${ {\tilde{\zeta }}\triangleq \zeta -\mu }$ where ${ {\tilde{{\mathcal {K}}}} \triangleq \{ {\tilde{\zeta }}: \Vert {\tilde{\zeta }} \Vert _p \le \alpha \}}$. The set ${\tilde{{\mathbf {K}}}}({\mathbf {x}})$ can be defined as the following

$$\begin{aligned} {\tilde{{\mathbf {K}}}}({\mathbf {x}}) =\left\{ {\tilde{\zeta }} \, \left| \, {\tilde{\zeta }} \in {\tilde{{\mathcal {K}}}} \right. \right\} \, \bigcap \, \left\{ {\tilde{\zeta }} \, \left| \, {\tilde{\zeta }} \le T{\mathbf {x}}-\mu \right. \right\} . \end{aligned}$$

We first show that $\zeta \in {\mathbf {K}}({\mathbf {x}})$ if and only if ${\tilde{\zeta }} \in {\tilde{{\mathbf {K}}}}({\mathbf {x}})$. Suppose $\zeta \in {\mathbf {K}}({\mathbf {x}})$. Then $\zeta \in \mathcal {K}$ and $c({\mathbf {x}},\zeta ) = T{\mathbf {x}}-\zeta \ge 0$. If $\zeta \in \mathcal {K}$, then $\Vert \zeta -\mu \Vert _p \le \alpha $ or $\Vert {\tilde{\zeta }}\Vert _p \le \alpha $ where ${\tilde{\zeta }} = \zeta - \mu $. Furthermore, $T{\mathbf {x}}\ge \zeta $ can be rewritten as $T{\mathbf {x}}- \mu \ge \zeta - \mu $ or $T{\mathbf {x}}- \mu \ge {\tilde{\zeta }}$. It follows that

$$\begin{aligned} {\tilde{\zeta }} \in {\tilde{{\mathbf {K}}}}({\mathbf {x}}) \, = \, \left\{ {\tilde{\zeta }} \, \left| \, {\tilde{\zeta }} \in {\tilde{{\mathcal {K}}}} \right. \right\} \, \bigcap \, \left\{ {\tilde{\zeta }} \, \left| \, T{\mathbf {x}}- \mu \ge {\tilde{\zeta }}\right. \right\} . \end{aligned}$$

The reverse direction follows similarly. Consequently, ${\mathbb {P}}\left\{ \zeta \, \left| \, \zeta \in {\mathbf {K}}({\mathbf {x}}) \right. \right\} = {\mathbb {P}}\left\{ {\tilde{\zeta }} \, \left| \, {\tilde{\zeta }} \in {\tilde{{\mathbf {K}}}}({\mathbf {x}}) \right. \right\} .$ We now analyze the latter probability. It may be observed that the Minkowski functional associated with $ {\tilde{{\mathcal {K}}}}$ is given by $ \Vert {\tilde{\zeta }}\Vert _{{\tilde{{\mathcal {K}}}}} = \tfrac{1}{\alpha }\Vert {\tilde{\zeta }}\Vert _p$. Since $ {T_{i,\bullet } {\mathbf {x}}-\mu _i \ge \delta >0 }$ for $ i=1,\ldots ,d $, it follows that

$$\begin{aligned} {\tilde{{\mathbf {K}}}}({\mathbf {x}})&= \Bigg \{{\tilde{\zeta }} \, \left| \, \tfrac{1}{\alpha } \Vert {\tilde{\zeta }}\Vert _p \le 1 \right. \Bigg \} \, \bigcap \, \left\{ {\tilde{\zeta }} \, \left| \, \bigcap _{i=1}^d \tfrac{\max \{{\tilde{\zeta }}_i,0\}}{T_{i,\bullet } {\mathbf {x}}-\mu _i}\le 1 \right. \right\} \\&= \Bigg \{{\tilde{\zeta }} \, \left| \, \tfrac{1}{\alpha ^2}\Vert {\tilde{\zeta }} \Vert ^{2}_p \le 1 \right. \Bigg \} \, \bigcap \, \left\{ {\tilde{\zeta }} \, \left| \, \bigcap _{i=1}^d \left( \tfrac{\max \{{\tilde{\zeta }}_i,0\}}{T_{i,\bullet } {\mathbf {x}}-\mu _i}\right) ^2 \le 1 \right. \right\} \\&= \left\{ {\tilde{\zeta }} \, \left| \, \max \left\{ \tfrac{1}{\alpha ^2}\Vert {\tilde{\zeta }}\Vert _{p}^2,\left( \tfrac{\max \{{\tilde{\zeta }}_1,0\}}{T_{1,\bullet }{\mathbf {x}}-\mu _1}\right) ^2,\cdots , \left( \tfrac{\max \{{\tilde{\zeta }}_d,0\}}{T_{d,\bullet }{\mathbf {x}}-\mu _d}\right) ^2 \right\} \le 1 \right. \right\} . \end{aligned}$$

Since $ g_i({\mathbf {x}},{\tilde{\zeta }}) \ \triangleq \ \left( \frac{\max \{{\tilde{\zeta }}_i,0\}}{T_{i,\bullet } {\mathbf {x}}-\mu _i}\right) ^2 $ for $ i=1,\ldots , d $ and $ g_{d+1}({\mathbf {x}}, {\tilde{\zeta }}) \triangleq \tfrac{1}{\alpha ^2}\Vert {\tilde{\zeta }}\Vert ^{2}_{p} $ are PHFs with degree 2, then $ g({{\mathbf {x}}}, {\tilde{\zeta }}) \triangleq \max \{ g_1({\mathbf {x}}, {\tilde{\zeta }}),\ldots ,g_{d+1}({\mathbf {x}},{\tilde{\zeta }}) \} $ is positively homogeneous with degree 2. By selecting $h(\zeta ) = 1$ and ${\varLambda } = {\tilde{{\mathbf {K}}}}({\mathbf {x}})$, we may invoke Lemma 2, leading to the following equality.

$$\begin{aligned} f({\mathbf {x}}) \, = \, \int _{{\tilde{{\mathbf {K}}}}({\mathbf {x}})} 1 \ d {\tilde{\zeta }} = \frac{1}{\mathrm {Vol}({\mathcal {K}})} \frac{1}{\varGamma (1+d/2)} \int _{{\mathbb {R}}^d} e^{-g({\mathbf {x}},\xi )} \ d\xi . \end{aligned}$$

(40)

The Eq. (40) can be rewritten as

$$\begin{aligned} f({\mathbf {x}})&= \int _{{\mathbb {R}}^d} \underbrace{\left( \mathcal {C} (2\pi \sigma ^2)^{d/2} e^{-g({\mathbf {x}},\xi ) +\frac{\Vert \xi \Vert ^2_{2}}{2\sigma ^2}}\right) }_{\triangleq F({\mathbf {x}},\xi )}\underbrace{\left( \tfrac{1}{(2\pi \sigma ^2)^{d/2}} {e^{-\tfrac{\Vert \xi \Vert ^2_{2}}{2\sigma ^2}}} \right) }_{\triangleq p(\xi )} \ d\xi \\&= \int _{{\mathbb {R}}^{d}} F({\mathbf {x}},\xi ) \ {{\tilde{p}}}(\xi ) \ d\xi = \mathcal {C} \ {\mathbb {E}}_{{{\tilde{p}}}(\xi )}[F({\mathbf {x}},\xi )], \text{ where } \mathcal {C} \triangleq \tfrac{1}{\text {Vol}({\mathcal {K}})} \ {\tfrac{1}{\varGamma (1+d/2)}}, \end{aligned}$$

(b) Omitted (similar to proof of Lemma 8 (a)).

(c) When $ {\mathcal {K}}$ satisfies Assumption 2, the proof of Lemma 8(b) requires slight modification. Suppose $ F({\mathbf {x}},\xi ) $ and $ p(\xi ) $ are defined as in (a). Then we may define $\partial F({\mathbf {x}},\xi )$ as

$$\begin{aligned} \partial F({\mathbf {x}},\xi ) = {\left\{ \begin{array}{ll} \left( {\mathcal {C}}(2\pi \sigma ^2)^{{d}/2} \frac{2(\max \{\xi _i,0\})^2 T_{i,\bullet }^T}{(T_{i,\bullet } {\mathbf {x}}-\mu _i)^3} e^{-g_i({\mathbf {x}},\xi )+\frac{{\Vert \xi \Vert _{2}^2}}{2\sigma ^2}}\right) , &{} \xi \in \varXi _i({\mathbf {x}}), i = 1, \cdots , d \\ \left( -{\mathcal {C}}(2\pi \sigma ^2)^{{d}/2} e^{-g({\mathbf {x}},\xi ) +\frac{{\Vert \xi \Vert _{2}^2}}{2\sigma ^2}}\right) H({\mathbf {x}},\xi ), &{} \xi \in \varXi _0({\mathbf {x}})\\ \mathbf{0}. &{} \xi \in \varXi _{d+1}({\mathbf {x}}), \end{array}\right. } \end{aligned}$$

where $H({\mathbf {x}},\xi )$ denotes the Clarke generalized gradient of $g({\mathbf {x}},\xi )$, defined as in (17). Consequently, it follows that ${\mathbb {E}} \left[ \Vert G({\mathbf {x}},\xi )\Vert ^2\right] $ is bounded as follows.

$$\begin{aligned}&\quad {\mathbb {E}}\left[ \Vert G({\mathbf {x}},\xi )\Vert ^2\right] = \int _{{\mathbb {R}}^{d}} \Vert G({\mathbf {x}},\xi )\Vert ^2 {\tilde{p}}(\xi ) d\xi \nonumber \\&= \sum _{i=1}^{d} \int _{\varXi _i({\mathbf {x}})} \Vert G({\mathbf {x}},\xi )\Vert ^2 {\tilde{p}}(\xi ) d\xi + \int _{\varXi _{d+1}({\mathbf {x}})} \Vert \underbrace{G({\mathbf {x}},\xi )}_{ \ = \ \mathbf{0}}\Vert ^2 {\tilde{p}}(\xi ) d\xi \nonumber \\&\quad + \int _{\varXi _0({\mathbf {x}})} \Vert G({\mathbf {x}},\xi )\Vert ^2 {\tilde{p}}(\xi ) d\xi \nonumber \\&= \sum _{i=1}^{d}\int _{\varXi _i({\mathbf {x}})} \Vert G({\mathbf {x}},\xi )\Vert ^2 {\tilde{p}}(\xi ) d\xi , \end{aligned}$$

(41)

where the last equality follows from observing that $G({\mathbf {x}},\xi ) = 0$ for $\xi \in \varXi _{d+1}({\mathbf {x}})$ and the integral in (41) is zero because $\varXi _0({\mathbf {x}})$ is a measure zero set. It follows that

$$\begin{aligned} {\mathbb {E}}&\left[ \Vert G(x,\xi )\Vert ^2 \right] \\&=\sum _{i=1}^{d}\int _{\varXi _i(x)} 4\mathcal {C}^2(2\pi \sigma ^2)^d \frac{\Vert T_{i,\bullet }\Vert ^2}{(T_{i,\bullet } {\mathbf {x}}-\mu _i)^2} \left( \frac{\xi _k}{{T_{i,\bullet } {\mathbf {x}}-\mu _i}}\right) ^4 \\&\qquad e^{-\frac{2(\xi _i)^2}{(T_{i,\bullet } {\mathbf {x}}-\mu _i)^2}+\frac{\Vert \xi \Vert _{2}^2}{\sigma ^2}}\left( \tfrac{1}{(2\pi \sigma ^2)^{d/2}} e^{-\tfrac{\Vert \xi \Vert _{2}^2}{2\sigma ^2}}\right) d\xi ,\\&\le \sum _{i=1}^{d}\int _{\varXi _i(x)} 4\mathcal {C}^2(2\pi \sigma ^2)^d \frac{\Vert T_{i,\bullet }\Vert ^2}{\delta ^2}\left( \frac{\xi _k}{{T_{i,\bullet } {\mathbf {x}}-\mu _i}} \right) ^4 e^{-\frac{2(\xi _i)^2}{(T_{i,\bullet } {\mathbf {x}}-\mu _i)^2}+\frac{\Vert \xi \Vert _{2}^2}{\sigma ^2}}\left( \tfrac{1}{(2\pi \sigma ^2)^{d/2}} e^{-\tfrac{\Vert \xi \Vert _{2}^2}{2 \sigma ^2}}\right) d\xi \\&\le \sum _{i=1}^{d}\int _{\varXi _i(x)} 4\mathcal {C}^2(2\pi \sigma ^2)^d \frac{\Vert T_{i,\bullet }\Vert ^2}{\delta ^2}\left( \frac{\xi _i}{{T_{i,\bullet } {\mathbf {x}}-\mu _i}}\right) ^4\\&\qquad e^{-\left( 2-\tfrac{\alpha ^2}{\sigma ^2}\right) \frac{(\xi _i)^2}{(T_{i,\bullet } {\mathbf {x}}-\mu _i)^2}}\left( \tfrac{1}{(2\pi \sigma ^2)^{d/2}} ^{-\tfrac{\Vert \xi \Vert _{2}^2}{2\sigma ^2}}\right) d\xi \end{aligned}$$

where the first inequality follows from $ {T_{i,\bullet } {\mathbf {x}}-\mu _i \ge \delta >0 }$ for all i, and the second inequality follows from $ \xi \in \varXi _i({\mathbf {x}})$. It follows from Lemma 3 that given any $ \alpha $, by choosing the variance $ \sigma ^2$ of the random variable $ \xi $ such that $\sigma ^2 = \alpha ^2 $ leads to the bound $ {{\mathbb {E}} \left[ \Vert G({\mathbf {x}},\xi )\Vert ^2 \right] \le 16\mathcal {C}^2(2\pi \sigma ^2)^d {\sum _{i=1}^d} \frac{\Vert T_{i,\bullet }\Vert ^2}{\delta ^2 e^2}}. $ $\square $

Proof of Lemma 10:

If ${{\tilde{G}}}({\mathbf {x}}_k,\xi ) \triangleq G({\mathbf {x}}_k,\xi ) - {\mathbb {E}}[G({\mathbf {x}}_k,\xi )]$, by the conditional independence of ${\tilde{G}}({\mathbf {x}}_k,\xi _j)$ and ${\tilde{G}}({\mathbf {x}}_k,\xi _{\ell })$ for $j \ne \ell $, we have

$$\begin{aligned}&{\mathbb {E}}\left[ \Vert {\bar{w}}_{G,k}\Vert ^2 \mid \mathcal {F}_k \right] = \frac{1}{N^2_k}{\mathbb {E}}\left[ \left\| \sum _{j=1}^{N_k} {\tilde{G}} ({\mathbf {x}}_k,\xi _j)\right\| ^2 \Bigg | \, \mathcal {F}_k \right] \nonumber \\&\quad = \frac{1}{N_k^2} {\mathbb {E}}\left[ \left[ \sum _{j=1}^{N_k} \Vert {\tilde{G}}({\mathbf {x}}_k,\xi _j)\Vert ^2 + \sum _{\ell \ne j} 2{\tilde{G}}({\mathbf {x}}_k,\xi _{\ell })^T {\tilde{G}}({\mathbf {x}}_k,\xi _j) \right] \, \Bigg | \, \mathcal {F}_k \right] \nonumber \\&\quad = \frac{1}{N_k}\left( {\mathbb {E}}\left[ \Vert {G}({\mathbf {x}}_k,\xi )\Vert ^2\, \left| \, \mathcal {F}_k \right. \right] \right. \nonumber \\&\qquad \left. + \Vert {\mathbb {E}}\left[ G({\mathbf {x}}_k,\xi ) \, \left| \, \mathcal {F}_k \right. \right] \Vert ^2 - 2{\mathbb {E}}\left[ G({\mathbf {x}}_k,\xi )\, \left| \, \mathcal {F}_k \right. \right] ^T{\mathbb {E}}\left[ G({\mathbf {x}}_k,\xi ) \, \left| \, \mathcal {F}_k \right. \right] \right) \nonumber \\&\quad = \frac{1}{N_k}\left( {\mathbb {E}}\left[ \Vert {G}({\mathbf {x}}_k,\xi ) \Vert ^2\, \left| \, \mathcal {F}_k \right. \right] - \Vert {\mathbb {E}} \left[ G({\mathbf {x}}_k,\xi ) \, \left| \, \mathcal {F}_k \right. \right] \Vert ^2 \right) \nonumber \\&\quad \le \frac{1}{N_k} {\mathbb {E}} \left[ \Vert {G}({\mathbf {x}}_k,\xi )\Vert ^2 \mid \mathcal {F}_k\right] . \end{aligned}$$

(42)

By (42) and Prop. 2, ${\mathbb {E}}\left[ \Vert {\bar{w}}_{G,k}\Vert ^2 \, \left| \, \mathcal {F}_k \right. \right] \ \le \ \frac{\mathcal {C}^2_{{\mathcal {K}}}(2\pi )^n}{eN_k} {\mathbb {E}}_{{{\tilde{p}}}}\left[ \Vert \xi \Vert ^2\right] $ for Setting A. Similarly, for Setting B, by Lemma. 8,

$$\begin{aligned} {{\mathbb {E}}[\Vert {\bar{w}}_{G,k}\Vert ^2 \mid \mathcal {F}_k ] \le 16\mathcal {C}^2(2\pi \sigma ^2)^d {\sum _{i=1}^d} \frac{\Vert T_{i,\bullet }\Vert ^2}{\delta ^2 e^2 N_k} }. \end{aligned}$$

In addition, for Setting A, $ {\mathbb {E}}\left[ \Vert {\bar{w}}_{f,k}\Vert ^2 \, \left| \, \mathcal {F}_k \right. \right] \le \frac{2(\mathcal {C}_{{\mathcal {K}}}^2(2\pi )^n+1)}{N_k}$ while for Setting B, we obtain that ${\mathbb {E}}\left[ \, \Vert {\bar{w}}_{f,k}\Vert ^2 \, \left| \, \mathcal {F}_k \right. \right] \le \frac{\mathcal {C}^2(2\pi \sigma ^2)^d}{N_k}.$ $\square $

Proof of Lemma 11:

(Setting A) Consider ${\bar{w}}_k$, where ${\bar{w}}_k$ is defined as $ {\bar{w}}_k \triangleq \frac{-(G_k+{\bar{w}}_{G,k})}{(f({\mathbf {x}}_k) + {\bar{w}}_{f,k})^{2}+\epsilon _k}- \frac{-G_k}{(f({\mathbf {x}}_k))^{2}}.$ We have that

$$\begin{aligned} \Vert {\bar{w}}_k\Vert ^2&= \left\| \frac{-(G_k+{\bar{w}}_{G,k})}{(f({\mathbf {x}}_k) + {\bar{w}}_{f,k})^{2}+\epsilon _k}- \frac{-G_k}{(f({\mathbf {x}}_k))^{2}} \right\| ^2 \\&= \left\| \frac{-(G_k+{\bar{w}}_{G,k})}{(f({\mathbf {x}}_k) + {\bar{w}}_{f,k})^{2} +\epsilon _k}-\frac{{-}(G_k+{\bar{w}}_{G,k})}{(f({\mathbf {x}}_k))^{2}+\epsilon _k} +\frac{{-}(G_k+{\bar{w}}_{G,k})}{(f({\mathbf {x}}_k))^{2}+\epsilon _k}\right. \\&\quad \left. - \frac{{-}G_k}{(f({\mathbf {x}}_k))^{2}+\epsilon _k} + \frac{{-}G_k}{(f({\mathbf {x}}_k))^{2}+\epsilon _k}- \frac{{-}G_k}{(f({\mathbf {x}}_k))^{2}} \right\| ^2 \\&\le 3\left\| G_k - G_k+{\bar{w}}_{G,k}\right\| ^2 \frac{1}{((f({\mathbf {x}}_k))^{2} +\quad \epsilon _k)^2}\\&\quad +3\left\| G_k+{\bar{w}}_{G,k}\right\| ^2 \left\| \frac{1}{(f({\mathbf {x}}_k))^{2}+\epsilon _k} -\frac{1}{(f({\mathbf {x}}_k) + {\bar{w}}_{f,k})^{2}+\epsilon _k}\right\| ^2 \\&\quad + 3\left\| G_k\right\| ^2 \left\| \frac{1}{(f({\mathbf {x}}_k))^{2}} -\frac{1}{(f({\mathbf {x}}_k))^{2} +\epsilon _k}\right\| ^2 \\&\le 3\left\| {\bar{w}}_{G,k}\right\| ^2 \frac{1}{((f({\mathbf {x}}_k))^{2}+\epsilon _k)^2}\\&\quad +3\left\| G_k+{\bar{w}}_{G,k}\right\| ^2 \left\| \frac{(2f({\mathbf {x}}_k)+{\bar{w}}_{f,k}) {\bar{w}}_{f,k}}{((f({\mathbf {x}}_k))^{2}+\epsilon _k)((f({\mathbf {x}}_k) + {\bar{w}}_{f,k})^{2} +\epsilon _k)}\right\| ^2 \\&\quad + 3\left\| G_k\right\| ^2 \left\| \frac{\epsilon _k}{(f({\mathbf {x}}_k))^{2}((f({\mathbf {x}}_k))^{2} +\epsilon _k)}\right\| ^2 \\&\le 3\left\| {\bar{w}}_{G,k}\right\| ^2 \frac{1}{\epsilon _f^{{4}}}+{3}\left\| G_k+{\bar{w}}_{G,k}\right\| ^2 \left\| \frac{(2f({\mathbf {x}}_k)+{\bar{w}}_{f,k})}{\epsilon _f^{2} \epsilon _k}\right\| ^2\Vert {\bar{w}}_{f,k}\Vert ^2 + 3\left\| G_k\right\| ^2 \left( \frac{\epsilon _k^2}{\epsilon _f^{{8}}}\right) \\&\le 3\left\| {\bar{w}}_{G,k}\right\| ^2 \frac{1}{\epsilon _f^4}+{3} \left\| G_k+{\bar{w}}_{G,k}\right\| ^2 \frac{(8f^2({\mathbf {x}}_k)\Vert {\bar{w}}_{f,k}\Vert ^2+2\Vert {\bar{w}}_{f,k}\Vert ^4)}{\epsilon _f^{4} \epsilon ^2_k} + \left\| G_k\right\| ^2 \left( \frac{\epsilon _k^2}{\epsilon _f^{8}}\right) , \end{aligned}$$

where $f({\mathbf {x}}_k) \ge \epsilon _f$ for every ${\mathbf {x}}_k \in \mathcal {X}.$ Taking conditional expectations and recalling the independence of ${\bar{w}}_{f,k}$ and ${\bar{w}}_{G,k}$ conditional on $\mathcal {F}_k$, the following bound emerges.

$$\begin{aligned} {\mathbb {E}}&\left[ \Vert {\bar{w}}_k\Vert ^2 \, \bigg | \, \mathcal {F}_k \, \right] \le 3{\mathbb {E}}\left[ \left\| {\bar{w}}_{G,k}\right\| ^2 \, \bigg | \, \mathcal {F}_k\, \right] \frac{1}{\epsilon _f^2} \\&+{3}{\mathbb {E}}\left[ \left\| G_k+{\bar{w}}_{G,k}\right\| ^2 \frac{(8f^2({\mathbf {x}}_k)\Vert {\bar{w}}_{f,k}\Vert ^2+2\Vert {\bar{w}}_{f,k}\Vert ^4)}{\epsilon _f^{{4}} \epsilon _k^2}\, \bigg | \, \mathcal {F}_k \right] + 3{\mathbb {E}}\left[ \left\| G_k\right\| ^2 \left( \frac{\epsilon _k^2}{\epsilon _f^{{8}}}\right) \, \bigg | \, \mathcal {F}_k \right] \\&\le 3\frac{\nu _G^2}{\epsilon _f^2N_k} +{3}{\mathbb {E}}\left[ \left\| G_k+{\bar{w}}_{G,k}\right\| ^2 \, \bigg | \, \mathcal {F}_k \right] {\mathbb {E}}\left[ \frac{(8f^2({\mathbf {x}}_k)\Vert {\bar{w}}_{f,k}\Vert ^2+2\Vert {\bar{w}}_{f,k}\Vert ^4)}{\epsilon _f^{{4}} \epsilon _k^2}\, \bigg | \, \mathcal {F}_k \right] \\&\quad + \left( \frac{3\epsilon _k^2M_G^2}{\epsilon _f^{{8}}}\right) \\&\le 3\frac{\nu _G^2}{\epsilon _f^2N_k}+{3} M_G^2 \frac{8f^2({\mathbf {x}}_k)\nu _f^2}{\epsilon _f^{4} \epsilon _k^2N_k}+{3}M_G^2{\mathbb {E}}\left[ \frac{\Vert {\bar{w}}_{f,k}\Vert ^4}{\epsilon _f^{4} \epsilon _k^2}\, \bigg | \, \mathcal {F}_k \right] + 3\left( \frac{\epsilon _k^2M_G^2}{\epsilon _f^{{8}}}\right) , \end{aligned}$$

where $\Vert G_k\Vert ^2 = \Vert {\mathbb {E}}\left[ G({\mathbf {x}}_k,\xi ) \, \left| \, \mathcal {F}_k \right. \right] \Vert ^2 \le {\mathbb {E}}\left[ \Vert G({\mathbf {x}}_k,\xi )\Vert ^2 \, \left| \, \right. \mathcal {F}_k\right] \le M_G^2$ by Jensen’s inequality. From Prop. 2(b,c), $| F({\mathbf {x}},\xi )| \le {M_F}$ for any ${\mathbf {x}}, \xi $, implying that

$$\begin{aligned} \Vert {\bar{w}}_{f,k}\Vert ^2 = \left\| \frac{\sum _{j=1}^{N_k} F({\mathbf {x}}_k,\xi _j)}{N_k} - f({\mathbf {x}}_k)\right\| ^2&\le 2 \left\| \frac{\sum _{j=1}^{N_k} F({\mathbf {x}}_k,\xi _j)}{N_k}\right\| ^2 + 2f^2({\mathbf {x}}_k) \\&\le 2({M_F}^2+1). \end{aligned}$$

Consequently, by recalling that $\epsilon _k = 1/N_k^{1/4}$, the following holds a.s.

$$\begin{aligned} {\mathbb {E}}\left[ \Vert {\bar{w}}_k\Vert ^2 \, \bigg | \, \mathcal {F}_k \right]&\le \frac{3\nu _G^2}{\epsilon _f^2N_k}+{24}M_G^2 \frac{f^2({\mathbf {x}}_k)\nu _f^2}{\epsilon _f^{{4}} \epsilon _k^2N_k} +{3}{\mathbb {E}}\left[ \frac{\Vert {\bar{w}}_{f,k}\Vert ^4}{\epsilon _f^4 \epsilon _k^2}\, \bigg | \, {\mathbf {x}}_k \right] + \left( \frac{\epsilon _k^2M_G^2}{\epsilon _f^8}\right) \\&\le \frac{\nu _G^2}{\epsilon _f^2N_k}+{24}M_G^2 \frac{f^2({\mathbf {x}}_k)\nu _f^2}{\epsilon _f^{{4}} \epsilon _k^2N_k}+\frac{{6}({M_F}^2+1)M_G^2\nu _f^2}{\epsilon _f^4 \epsilon _k^2 N_k}\\&\le \frac{3\nu _G^2}{\epsilon _f^2\sqrt{N_k}}+M_G^2 \frac{{24}f^2({\mathbf {x}}_k)\nu _f^2}{\epsilon _f^{{4}} \sqrt{N_k}}+\frac{{6}(M_F^2+1)\nu _f^2}{\epsilon _f^{{4}} \sqrt{N_k}} + \left( \frac{3M_G^2}{\epsilon _f^{{8}}\sqrt{N_k}}\right) \\&\triangleq \frac{\nu ^2}{\sqrt{N_k}}, \text{ where } \nu ^2 \triangleq \frac{3\nu _G^2}{\epsilon _f^2}+M_G^2 \frac{{24}\nu _f^2}{\epsilon _f^{{4}} }+\frac{{6}({M_F}^2+1)\nu _f^2}{\epsilon _f^{{4}}} + \left( \frac{3M_G^2}{\epsilon _f^{{8}}}\right) . \end{aligned}$$

(Setting B) Since $ {\bar{w}}_k \triangleq {\frac{{-}(G_{k}+{\bar{w}}_{G,k})}{(f({\mathbf {x}}_k) + {\bar{w}}_{f,k})+\epsilon _k}+ \frac{G_{k}}{f({\mathbf {x}}_k)}}$ and

$$\begin{aligned} \Vert {\bar{w}}_{k}\Vert ^2&= \left\| \frac{{-}(G_{k}+{\bar{w}}_{G,{k}})}{(f({\mathbf {x}}_k) + {\bar{w}}_{f,{k}})+\epsilon _k}- \frac{{-}G_k}{(f({\mathbf {x}}_k))} \right\| ^2 \\&= \left\| \frac{{-}(G_{k}+{\bar{w}}_{G,{k}})}{(f({\mathbf {x}}_k) + {\bar{w}}_{f,k}) +\epsilon _k}-\frac{{-}(G_{k}+{\bar{w}}_{G,{k}})}{f({\mathbf {x}}_k)+\epsilon _k} +\frac{{-}(G_{k}+{\bar{w}}_{G,k})}{f({\mathbf {x}}_k)+\epsilon _k}\right. \\&\quad \left. - \frac{{-}G_{k}}{f({\mathbf {x}}_k)+\epsilon _k} + \frac{{-}G_{k}}{f({\mathbf {x}}_k)+\epsilon _k}- \frac{{-}G_{k}}{f({\mathbf {x}}_k)} \right\| ^2 \\&\le 3\left\| G_{k} - G_{k}+{\bar{w}}_{G,{k}}\right\| ^2 \frac{1}{{(f({\mathbf {x}}_k)+\epsilon _k)^2}} \\&\quad +3\left\| G_{k}+{\bar{w}}_{G,{k}}\right\| ^2 \left\| \frac{1}{f({\mathbf {x}}_k)+\epsilon _k} -\frac{1}{(f({\mathbf {x}}_k) + {\bar{w}}_{f,{k}})+\epsilon _k}\right\| ^2 \\&\quad + 3\left\| G_{k}\right\| ^2 \left\| \frac{1}{f({\mathbf {x}}_k)} -\frac{1}{f({\mathbf {x}}_k)+\epsilon _k} \right\| ^2 \\&\le 3\left\| {\bar{w}}_{G,{k}}\right\| ^2 \frac{1}{{(f({\mathbf {x}}_k)+\epsilon _k)^2}} +3\left\| G_{k}+ {\bar{w}}_{f,{k}}\right\| ^2 \\&\qquad \left\| \frac{{\bar{w}}_{f,k}}{(f({\mathbf {x}}_k) +\epsilon _k)(\underbrace{(f({\mathbf {x}}_k) + {\bar{w}}_{f,{k}})}_{{\ge 0, F({\mathbf {x}}_k,\xi ) \ge 0}} +\epsilon _k)}\right\| ^2 \\&\quad + 3\left\| G_{k}\right\| ^2 \left\| \frac{\epsilon _k}{f({\mathbf {x}}_k)(f({\mathbf {x}}_k)+\epsilon _k)} \right\| ^2 \\&\le 3\left\| {\bar{w}}_{G,{k}}\right\| ^2 \frac{1}{{\epsilon _f^2}} +3\left\| G_k+{\bar{w}}_{G,{k}}\right\| ^2 \left\| \frac{1}{\epsilon _f \epsilon _k}\right\| ^2\Vert {\bar{w}}_{f,{k}}\Vert ^2 + 3\left\| G_{k}\right\| ^2 \left( \frac{\epsilon _k^2}{\epsilon _f^{4}}\right) \\&\le 3\left\| {\bar{w}}_{G,{k}}\right\| ^2 \frac{1}{\epsilon _f^2}+3\left\| G_{k}+{\bar{w}}_{G,{k}}\right\| ^2 \frac{\Vert {\bar{w}}_{f,{k}}\Vert ^2}{\epsilon _f^{2} \epsilon ^2_k} + \left\| G_{k}\right\| ^2 \left( \frac{\epsilon _k^2}{\epsilon _f^{4}}\right) , \end{aligned}$$

where $f({\mathbf {x}}_k) \ge \epsilon _f$ and for every ${\mathbf {x}}_k \in \mathcal {X}.$ Taking expectations conditioned on $\mathcal {F}_k$ and recalling the independence of ${\bar{w}}_{f,k}$ and ${\bar{w}}_{G,k}$ conditional on $\mathcal {F}_k$, we have the following bound.

$$\begin{aligned}&{\mathbb {E}}\left[ \Vert {\bar{w}}_{k}\Vert ^2 \, \bigg | \, \mathcal {F}_k\right] \\&\quad \le \left( 3{\mathbb {E}}\left[ \left\| {\bar{w}}_{G,{k}}\right\| ^2 \,\bigg | \,\mathcal {F}_k\right] \frac{1}{\epsilon _f^2}\right. \\&\qquad \left. +3{\mathbb {E}}\left[ \left\| G_{k}+{\bar{w}}_{G,{k}}\right\| ^2 \frac{\Vert {\bar{w}}_{f,{k}}\Vert ^2}{\epsilon _f^{2} \epsilon _k^2}\bigg | \mathcal {F}_k\right] + 3{\mathbb {E}}\left[ \left\| G_{k}\right\| ^2 \left( \frac{\epsilon _k^2}{\epsilon _f^{4}}\right) \,\bigg | \,\mathcal {F}_k\right] \right) \\&\quad \le \left( 3\frac{\nu _{G}^2}{\epsilon _f^2N_k} + 3{\mathbb {E}}\left[ \left\| G_{k}+{\bar{w}}_{G,{k}}\right\| ^2 \,\bigg | \,\mathcal {F}_k\right] {\mathbb {E}}\left[ \frac{\Vert {\bar{w}}_{f,{k}}\Vert ^2}{\epsilon _f^{2} \epsilon _k^2} \,\bigg | \,\mathcal {F}_k\right] + 3\left( \frac{\epsilon _k^2M_{G}^2}{\epsilon _f^{4}}\right) \right) \\&\quad \le \left( 3\frac{\nu _{G}^2}{\epsilon _f^2N_k}+3M_G^2 \frac{\nu _{f}^2}{\epsilon _f^{2} \epsilon _k^2N_k}+ 3\left( \frac{\epsilon _k^2M_{G}^2}{\epsilon _f^{4}}\right) \right) . \end{aligned}$$

By selecting $\epsilon _k = 1/N_k^{1/4}$, we have that

$$\begin{aligned} {\mathbb {E}}\left[ \Vert {\bar{w}}_{k}\Vert ^2 \, \bigg | \, \mathcal {F}_k\right]&\le \frac{\nu ^2}{\sqrt{N_k}}, \text{ where } \nu ^2 \triangleq \left( 3\frac{\nu _{G}^2}{\epsilon _f^2}+3M_G^2 \frac{\nu _{f}^2}{\epsilon _f^{2} }+ 3\left( \frac{M_{G}^2}{\epsilon _f^{4}}\right) \right) . \end{aligned}$$

$\square $

Proof of Proposition 7:

(i) Using the update rule of $ {\mathbf {x}}_{k+1}$ and the fact that ${\mathbf {x}}^*=\varPi _{{\mathcal {X}}} [{\mathbf {x}}^*]$, for any $d_k + {\bar{w}}_{k}$ where $d_k \in \partial h({\mathbf {x}}_k)$ and $k \ge 1$,

$$\begin{aligned} {1\over 2}\Vert {\mathbf {x}}_{k+1}-{\mathbf {x}}^*\Vert ^2&{ = } {1\over 2}\Vert \varPi _{{\mathcal {X}}} ({\mathbf {x}}_k-\gamma _k {(d_k + {\bar{w}}_k)})-\varPi _{\mathcal {X}}({\mathbf {x}}^*))\Vert ^2\\&\le {1\over 2}\Vert {\mathbf {x}}_k-\gamma _k {(d_k+{\bar{w}}_k)}-{\mathbf {x}}^*\Vert ^2\\&={1\over 2}\Vert {\mathbf {x}}_k-{\mathbf {x}}^*\Vert ^2+{1\over 2}\gamma _k^2\Vert {d_k + {\bar{w}}_k}\Vert ^2-\gamma _k({\mathbf {x}}_k-{\mathbf {x}}^*)^T({d_k}+{{\bar{w}}}_{k}), \end{aligned}$$

where in the second inequality, we employ the non-expansivity of projection operator. Now by using the convexity of h, we obtain:

$$\begin{aligned} 2\gamma _k (h({\mathbf {x}}_k)-h({\mathbf {x}}^*))&\le \left( \Vert {\mathbf {x}}_k-{\mathbf {x}}^*\Vert ^2-\Vert {\mathbf {x}}_{k+1}-{\mathbf {x}}^*\Vert ^2\right) +{\Vert {d_k+{\bar{w}}_k}\Vert ^2\gamma _k^2}\\&\quad -{2}\gamma _k{{\bar{w}}}_{k}^T({\mathbf {x}}_k-{\mathbf {x}}^*)\\&\le \left( \Vert {\mathbf {x}}_k-{\mathbf {x}}^*\Vert ^2-\Vert {\mathbf {x}}_{k+1}-{\mathbf {x}}^*\Vert ^2\right) +{\Vert {d_k+{\bar{w}}_k} \Vert ^2\gamma _k^2}\\&\quad + \gamma _k^2 \Vert {\mathbf {x}}_k-{\mathbf {x}}^*\Vert ^2 + \Vert {\bar{w}}_k\Vert ^2, \end{aligned}$$

where we use $a^Tb\le {1\over 2}\Vert a\Vert ^2+{1\over 2}\Vert b\Vert ^2$. Now by summing from $k = {\widehat{K}}$ to $K-1$, where ${\widehat{K}}$ is an integer satisfying $0 \le {\widehat{K}} < K-1$, we obtain the next inequality.

$$\begin{aligned} \sum _{k={\widehat{K}}}^{K-1} 2\gamma _k (h({\mathbf {x}}_k)-h({\mathbf {x}}^*))&\le {\Vert {\mathbf {x}}_{{{\hat{K}}}}-{\mathbf {x}}^*\Vert ^2}+\sum _{k={\widehat{K}}}^{K-1} \gamma _k^2({\Vert {d_k+{\bar{w}}_k}\Vert ^2}+ \Vert {\mathbf {x}}_k-{\mathbf {x}}^*\Vert ^2) +\Vert {\bar{w}}_k\Vert ^2. \end{aligned}$$

Dividing both sides by $2\sum _{k={\widehat{K}}}^{K-1} \gamma _k$, taking expectations on both sides, and invoking Lemma 11 which leads to ${\mathbb {E}}[\Vert {{\bar{w}}}_{k}\mid {\mathcal {F}}_k\Vert ]^2\le {\nu ^2\over \sqrt{N_k}}$ and the bound of the subgradient, i.e., ${\mathbb {E}}[\Vert d_k + {\bar{w}}_k\Vert ^2]\le M_G^2$, we obtain the following bound.

$$\begin{aligned} {\mathbb {E}}&\left[ \frac{\sum _{k={\widehat{K}}}^{K-1} 2\gamma _k (h({\mathbf {x}}_k)-h({\mathbf {x}}^*))}{\sum _{k={\widehat{K}}}^{K-1} 2\gamma _k}\right] \nonumber \\&\le {\mathbb {E}}\left[ \frac{{\Vert {\mathbf {x}}_{{{\hat{K}}}}-{\mathbf {x}}^*\Vert ^2} +\sum _{k={\widehat{K}}}^{K-1} \gamma _k^2{\Vert {d_k+{\bar{w}}_k}\Vert ^2} + \sum _{k={\widehat{K}}}^{K-1} \gamma _k^2 \Vert {\mathbf {x}}_k-{\mathbf {x}}^*\Vert ^2 + \sum _{k={\widehat{K}}}^{K-1} \Vert {\bar{w}}_k\Vert ^2}{\sum _{k={\widehat{K}}}^{K-1}2 \gamma _k}\right] \end{aligned}$$

(43)

$$\begin{aligned}&\le \frac{{\mathbb {E}}[\Vert x_{{\widehat{K}}}-x^*\Vert ^2]}{\sum _{k={\widehat{K}}}^{K-1}2\gamma _k } + \frac{\sum _{k={\widehat{K}}}^{K-1}\gamma _k^2 (M_G^2 + B^2)}{\sum _{k={\widehat{K}}}^{K-1}2\gamma _k }+ \frac{\sum _{k={\widehat{K}}}^{K-1} \tfrac{\nu ^2}{\sqrt{N_k}}}{\sum _{k={\widehat{K}}}^{K-1}2\gamma _k}. \end{aligned}$$

(44)

By utilizing Jensen’s inequality, we obtain that

$$\begin{aligned} {\mathbb {E}}\left[ (h({\bar{x}}_{{\widehat{K}},K}-h({\mathbf {x}}^*))\right] \le {\mathbb {E}}\left[ \frac{\sum _{k={\widehat{K}}}^{K-1} 2\gamma _k (h({\mathbf {x}}_k)-h({\mathbf {x}}^*))}{\sum _{k={\widehat{K}}}^{K-1} 2\gamma _k}\right] , \end{aligned}$$

where ${\bar{x}}_{{\widehat{K}},K} \triangleq \tfrac{\sum _{k={\widehat{K}}}^{K-1} \gamma _k x_k}{\sum _{k={\widehat{K}}}^{K-1} \gamma _k},$ which when combined with (44) leads to (30). $\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bardakci, I.E., Jalilzadeh, A., Lagoa, C. et al. Probability maximization via Minkowski functionals: convex representations and tractable resolution. Math. Program. 199, 595–637 (2023). https://doi.org/10.1007/s10107-022-01859-8

Download citation

Received: 30 September 2020
Accepted: 29 May 2022
Published: 08 September 2022
Issue Date: May 2023
DOI: https://doi.org/10.1007/s10107-022-01859-8

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Probability maximization via Minkowski functionals: convex representations and tractable resolution

Abstract

Access this article

Similar content being viewed by others

Bernstein–Jackson Inequalities on Gaussian Hilbert Spaces

Bernstein–Kantorovich operators, approximation and shape preserving properties

Subdifferentials and Coderivatives of Efficient Point Multifunctions in Parametric Convex Vector Optimization

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix

Proof of Theorem 2:

Proof of Lemma 3:

Proof of Proposition 2:

Proof of Proposition 3:

Proof of Lemma 6:

Proof of Proposition 6:

Proof of Lemma 10:

Proof of Lemma 11:

Proof of Proposition 7:

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

Probability maximization via Minkowski functionals: convex representations and tractable resolution

Abstract

Access this article

Similar content being viewed by others

Bernstein–Jackson Inequalities on Gaussian Hilbert Spaces

Bernstein–Kantorovich operators, approximation and shape preserving properties

Subdifferentials and Coderivatives of Efficient Point Multifunctions in Parametric Convex Vector Optimization

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix

Appendix

Proof of Theorem 2:

Proof of Lemma 3:

Proof of Proposition 2:

Proof of Proposition 3:

Proof of Lemma 6:

Proof of Proposition 6:

Proof of Lemma 10:

Proof of Lemma 11:

Proof of Proposition 7:

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation