Sample average approximation with heavier tails I: non-asymptotic bounds with weak assumptions and stochastic constraints

Oliveira, Roberto I.; Thompson, Philip

doi:10.1007/s10107-022-01810-x

Sample average approximation with heavier tails I: non-asymptotic bounds with weak assumptions and stochastic constraints

Full Length Paper
Series A
Published: 20 April 2022

Volume 199, pages 1–48, (2023)
Cite this article

Mathematical Programming Submit manuscript

1046 Accesses
4 Citations
Explore all metrics

Abstract

We derive new and improved non-asymptotic deviation inequalities for the sample average approximation (SAA) of an optimization problem. Our results give strong error probability bounds that are “sub-Gaussian” even when the randomness of the problem is fairly heavy tailed. Additionally, we obtain good (often optimal) dependence on the sample size and geometrical parameters of the problem. Finally, we allow for random constraints on the SAA and unbounded feasible sets, which also do not seem to have been considered before in the non-asymptotic literature. Our proofs combine different ideas of potential independent interest: an adaptation of Talagrand’s “generic chaining” bound for sub-Gaussian processes; “localization” ideas from the Statistical Learning literature; and the use of standard conditions in Optimization (metric regularity, Slater-type conditions) to control fluctuations of the feasible set.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Sample Average Approximation in a Two-Stage Stochastic Linear Program with Quantile Criterion

Article 01 December 2018

Sample average approximations of strongly convex stochastic programs in Hilbert spaces

Article Open access 28 May 2022

On rates of convergence for sample average approximations in the almost sure sense and in mean

Article Open access 04 May 2019

Notes

Another typical light-tail condition is to assume an sub-exponential tail.
The constants appearing in our Theorem 1 are not the same as in [12], but can be easily obtained via the same method.
When ${\mathcal {I}}\ne \emptyset $, we assume without too much loss in generality in Assumption 5 that $\epsilon \in (0,\vartheta ^*]$ with $\vartheta ^*$ as in Assumption 4. For instance, in case f is locally strongly convex on a neighbourhood U of $X^*$ and Assumption 4 holds, the existence of a $x\in X_{-\eta _*}^{*,\vartheta _*}\cap U$ is a mild requirement.
Indeed, if $x_{\delta }\in X_{\delta }$ is the metric projection of $x^*_\delta $ onto X for some $\delta >0$, by (16) and Assumption 3, we have $f^*-f_{\delta }^*\le f(x_\delta )-f(x_\delta ^*)\le \sigma {\mathsf {d}}(x^*_\delta ,X)\le \sigma {\mathfrak {c}}\delta $.
Using $\sqrt{\epsilon R\delta }\le 2\epsilon +2R\delta $ and $\epsilon ,\delta \le 1$.

References

Artstein, Z., Wets, R.J.B.: Consistency of minimizers and the SLLN for stochastic programs. J. Convex Anal. 2, 1–17 (1995)
MathSciNet MATH Google Scholar
Atlason, J., Epelman, M.A., Henderson, S.G.: Call center staffing with simulation and cutting plane methods. Ann. Oper. Res. 127(1), 333–358 (2004)
MathSciNet MATH Google Scholar
Banholzer, D., Fliege, J., Werner, R.: On rates of convergence for sample average approximations in the almost sure sense and in mean. Math. Program. (2019). https://doi.org/10.1007/s10107-019-01400-4
Article MATH Google Scholar
Bartlett, P., Bousquet, O., Mendelson, S.: Local Rademacher complexities. Ann. Stat. 33, 1497–1537 (2005)
MathSciNet MATH Google Scholar
Bartlett, P., Mendelson, S.: Empirical minimization. Probab. Theory Relat. Fields 135(3), 311–334 (2006)
MathSciNet MATH Google Scholar
Barlett, P.L., Mendelson, S., Neeman, J.: $\ell _1$-regularized linear regression: persistence and oracle inequalities. Probab. Theory Relat. Fields 154, 193–224 (2012)
MATH Google Scholar
Bauschke, H.H., Borwein, J.M.: On projection algorithms for solving convex feasibility problems. SIAM Rev. 38(3), 367–426 (1996)
MathSciNet MATH Google Scholar
Bickel, P.J., Ritov, Y., Tsybakov, A.B.: Simultaneous analysis of the Lasso and Dantzig selector. Ann. Stat. 37(4), 1705–1732 (2009)
MathSciNet MATH Google Scholar
Boucheron, S., Lugosi, G., Massart, P.: Concentration Inequalities: A Nonasymptotic Theory of Independence. Oxford University Press, Oxford (2013)
MATH Google Scholar
Burke, J.V., Deng, S.: Weak sharp minima revisited, part II: application to linear regularity and error bounds. Math. Program. 104, 235–261 (2005)
MathSciNet MATH Google Scholar
Burke, J.V., Ferris, M.C.: Weak sharp minima in mathematical programming. SIAM J. Control. Optim. 31, 1340–1359 (1993)
MathSciNet MATH Google Scholar
Dirksen, S.: Tail bounds via generic chaining. Electron. J. Probab. 20, 1–29 (2015)
MathSciNet MATH Google Scholar
Dupacovà, J., Wets, R.J.B.: Asymptotic behavior of statistical estimators and of optimal solutions of stochastic optimization problems. Ann. Stat. 16(4), 1517–1549 (1988)
MathSciNet MATH Google Scholar
Ermoliev, Y.M., Norkin, V.I.: Sample average approximation for compound stochastic optimization problems. SIAM J. Optim. 23(4), 2231–2263 (2013)
MathSciNet MATH Google Scholar
Guigues, V., Juditsky, A., Nemirovski, A.: Non-asymptotic confidence bounds for the optimal value of a stochastic program. Optim. Methods Softw. 32(5), 1033–1058 (2017)
MathSciNet MATH Google Scholar
Hoffman, A.J.: On approximate solutions of systems of linear inequalities. J. Res. Natl. Bur. Stand. 49, 263–265 (1952)
MathSciNet Google Scholar
Homem-de-Mello, T., Bayraksan, G.: Monte Carlo sampling-based methods for stochastic optimization. Surv. Oper. Res. Manag. Sci. 19, 56–85 (2014)
MathSciNet Google Scholar
Iusem, A., Jofré, A., Thompson, P.: Incremental constraint projection methods for monotone stochastic variational inequalities. Math. Oper. Res. 44(1), 236–263 (2018)
MathSciNet MATH Google Scholar
Kanková, V., Houda, M.: Thin and heavy tails in stochastic programming. Kybernetika 51(3), 433–456 (2015)
MathSciNet MATH Google Scholar
Kanková, V., Omelchenko, V.: Empirical estimates in stochastic programs with probability and second order stochastic dominance constraints. Acta Math. Univ. Comenianae LXXXIV 2, 267–281 (2015)
MathSciNet MATH Google Scholar
Koltchinskii, V., Panchenko, D.: Complexities of convex combinations and bounding the generalization error in classification. Ann. Stat. 33, 1455–1496 (2005)
MathSciNet MATH Google Scholar
Koltchinskii, V.: Local Rademacher complexities and oracle inequalities in risk minimization. Ann. Stat. 34(6), 2593–2656 (2006)
MathSciNet MATH Google Scholar
Kim, S., Pasupathy, R., Henderson, S.G.: A guide to sample average approximation. In: Michael, Fu. (ed.) Handbook of Simulation Optimization, International Series in Operations Research & Management Science, vol. 216, pp. 207–243. Springer, New York (2015)
Google Scholar
King, A.J., Rockafellar, R.T.: Asymptotic theory for solutions in statistical estimation and stochastic programming. Math. Oper. Res. 18, 148–162 (1993)
MathSciNet MATH Google Scholar
King, A.J., Wets, R.J.B.: Epi-consistency of convex stochastic programs. Stoch. Stoch. Rep. 34, 83–92 (1991)
MathSciNet MATH Google Scholar
Kleywegt, A.J., Shapiro, A., Homem-de-Mello, T.: The sample average approximation method for stochastic discrete optimization. SIAM J. Optim. 12(2), 479–502 (2001)
MathSciNet MATH Google Scholar
Lojasiewicz, M.S.: Sur le problème de la division. Stud. Math. 18, 87–136 (1959)
MATH Google Scholar
Mendelson, S.: Learning without concentration. J. ACM 62(3), 1–25 (2015)
MathSciNet MATH Google Scholar
Mendelson, S.: Local vs. global parameters—breaking the Gaussian complexity barrier. Ann. Stat. 45(5), 1835–1862 (2017)
MathSciNet MATH Google Scholar
Oliveira, R.I., Thompson, P.: Sample average approximation with heavier tails II: localization in stochastic convex optimization and persistence results for the lasso (2020)
Panchenko, D.: Symmetrization approach to concentration inequalities for empirical processes. Ann. Probab. 31, 2068–2081 (2003)
MathSciNet MATH Google Scholar
Pang, J.-S.: Error bounds in mathematical programming. Math. Program. Ser. B 79(1), 299–332 (1997)
MathSciNet MATH Google Scholar
Pflug, G.C.: Asymptotic stochastic programs. Math. Oper. Res. 20, 769–789 (1995)
MathSciNet MATH Google Scholar
Pflug, G.C.: Stochastic programs and statistical data. Ann. Oper. Res. 85, 59–78 (1999)
MathSciNet MATH Google Scholar
Pflug, G.C.: Stochastic optimization and statistical inference. In: Ruszczyński, A., Shapiro, A. (eds.) Handbooks in OR & MS, vol. 10, pp. 427–482. Elsevier, Amsterdam (2003)
Google Scholar
Robinson, S.M.: An application of error bounds for convex programming in a linear space. SIAM J. Control 13, 271–273 (1975)
MathSciNet MATH Google Scholar
Rockafellar, R.T., Urysaev, S.: Optimization of conditional value-at-risk. J. Risk 2(3), 493–517 (2000)
Google Scholar
Royset, J.O.: Optimality functions in stochastic programming. Math. Program. Ser. A 135, 293–321 (2012)
MathSciNet MATH Google Scholar
Römisch, W.: Stability of stochastic programming problems. In: Ruszczyński, A., Shapiro, A. (eds.) Handbooks in OR & MS, vol. 10, pp. 483–554. Elsevier, Amsterdam (2003)
Google Scholar
Shapiro, A.: Asymptotic properties of statistical estimators in stochastic programming. Ann. Stat. 17, 841–858 (1989)
MathSciNet MATH Google Scholar
Shapiro, A.: Asymptotic analysis of stochastic programs. Ann. Oper. Res. 30, 169–186 (1991)
MathSciNet MATH Google Scholar
Shapiro, A.: Monte Carlo sampling methods. In: Ruszczyński, A., Shapiro, A. (eds.) Handbooks in OR & MS, vol. 10, pp. 353–425. Elsevier, Amsterdam (2003)
Google Scholar
Shapiro, A., Dentcheva, D., Ruszczynski, A.: Lectures on Stochastic Programming: Modeling and Theory. MOS-SIAM Series on Optimization. SIAM, Philadelphia (2009)
MATH Google Scholar
Shapiro, A., Homem-de-Mello, T.: On the rate of convergence of optimal solutions of Monte Carlo approximations of stochastic programs. SIAM J. Optim. 11(1), 70–86 (2000)
MathSciNet MATH Google Scholar
Shapiro, A., Nemirovski, A.: On the complexity of stochastic programming problems. In: Jeyakumar, V., Rubinov, A. (eds.) Continuous Optimization: Current Trends and Modern Applications, vol. 99, pp. 111–146. Springer, Boston (2005)
Google Scholar
Talagrand, M.: Upper and Lower Bounds for Stochastic Processes. Springer, Berlin (2014)
MATH Google Scholar
Talagrand, M.: Sharper bounds for Gaussian and empirical processes. Ann. Probab. 22, 28–76 (1994)
MathSciNet MATH Google Scholar
Tibshirani, R.: Regression shrinkage and selection via the Lasso. J. R. Stat. Soc. Ser. B 58, 267–288 (1996)
MathSciNet MATH Google Scholar
Vogel, S.: Stability results for stochastic programming problems. Optimization 19(2), 269–288 (1998)
MathSciNet MATH Google Scholar
Vogel, S.: Confidence Sets and Convergence of Random Functions (2008) preprint at https://www.tu-ilmenau.de/fileadmin/media/orsto/vogel/Publikationen/Vogel-Grecksch-Geb-korr-1.pdf
Vogel, S.: Universal confidence sets for solutions of optimization problems. SIAM J. Optim. 19(3), 1467–1488 (2008)
MathSciNet MATH Google Scholar
Wand, W., Ahmed, S.: Sample average approximation of expected value constrained stochastic programs. Oper. Res. Lett. 36, 515–519 (2008)
MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Instituto de Matemática Pura e Aplicada IMPA, Rio de Janeiro, RJ, Brazil
Roberto I. Oliveira
Purdue University and Krannert School of Management, West Lafayette, USA
Philip Thompson

Authors

Roberto I. Oliveira
View author publications
You can also search for this author in PubMed Google Scholar
Philip Thompson
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Philip Thompson.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Roberto I. Oliveira is supported by CNPq grants 432310/2018-5 (Universal) and 304475/2019-0 (Produtividade em Pesquisa), and FAPERJ grants 202.668/2019 (Cientista do Nosso Estado) and 290.024/2021 (Edital Inteligência Artificial)

Appendix

Proof of Lemma 1

The second statement in the Lemma is a direct consequence of the first. Therefore, we will only prove the first statement.

Assume that $Z'_1,\dots ,Z'_n$ are independent copies of the $Z_1,\dots ,Z_n$. Also let $Z=(Z_1,\dots ,Z_n)^T$. What we want to prove is that, for any $t\ge 0$,

$$\begin{aligned} \mathbf{Want:}\;{\mathbb {P}}\left\{ {\mathbb {E}}\left[ \sum _{k=1}^N (Z_k - Z'_k)\mid Z\right] \ge \sqrt{2(1+t)\,\sum _{k=1}^N{\mathbb {E}}[(Z_k-Z'_k)^2\mid Z]} \right\} \le e^{-t}. \end{aligned}$$

By [31, Corollary 1], it suffices to prove that, for any $t\ge 0$,

$$\begin{aligned} \mathbf{Sufficient:}\;{\mathbb {P}}\left\{ \sum _{k=1}^N (Z_k - Z'_k)\ge \sqrt{2t\,\sum _{k=1}^N(Z_k-Z'_k)^2} \right\} \le e^{-t}. \end{aligned}$$

We will prove that the above inequality holds almost surely conditionally on values $|Z_k-Z'_k|=a_k$, $1\le k\le N$. Notice that, conditionally on these values,

$$\begin{aligned} Z_k - Z'_k = u_k\,a_k \end{aligned}$$

where the $u_k$ are i.i.d. unbiased random signs. So what we must show is that:

$$\begin{aligned} \forall t\ge 0\,:\, {\mathbb {P}}\left\{ \sum _{k=1}^N u_ia_i\ge \sqrt{2t \sum _{k=1}^Na_k^2}\right\} \le e^{-t}, \end{aligned}$$

for any choice of $a_k$, $1\le k\le N$. This follows easily from the standard inequalities:

$$\begin{aligned} \forall \theta >0\,:\,{\mathbb {E}}[e^{\theta \sum _{k=1}^N u_ia_i}] = \prod _{k=1}^N\cosh (\theta a_k)\le e^{\frac{\theta ^2\sum _{k=1}^Na_k^2}{2}}, \end{aligned}$$

and Bernstein’s trick:

$$\begin{aligned} {\mathbb {P}}\left\{ \sum _{k=1}^N u_ia_i\ge \sqrt{2t \sum _{k=1}^Na_k^2}\right\} \le \inf _{\theta >0}{\mathbb {E}}[e^{\theta \sum _{k=1}^N u_ia_i}]e^{-\theta \sqrt{2t \sum _{k=1}^Na_k^2}}\le e^{-t}. \end{aligned}$$

$\square $

Proof of Proposition 1

We will need the following Lemma.

Lemma 5

There exists a constant ${\mathbf{c}}_{{\mathsf {bdg}}}$ such that, for all $p\ge 2$ and all i.i.d. random variables $Z_1,\dots ,Z_N\in L^p$ with ${\mathbb {E}}[Z_i]=0$,

$$\begin{aligned} {\left\| \,\frac{Z_1 + \dots + Z_N}{N}\, \right\| }_{p}\le {\mathbf{c}}_{{\mathsf {bdg}}}\sqrt{\frac{p}{N}}\,{\left\| \,Z_1\, \right\| }_{p}, \end{aligned}$$

Proof of the Lemma

By the Burkholder–Davis–Gundy inequality and the subaditivity of the $L^{p/2}$ norm:

$$\begin{aligned} {\left\| \,Z_1 + \dots + Z_N\, \right\| }_{p}\le {\mathbf{c}}_{{\mathsf {bdg}}}\sqrt{p}\,{\left\| \,Z^2 _1+\dots +Z^2_N\, \right\| }_{p/2}^{1/2}\le {\mathbf{c}}_{{\mathsf {bdg}}}\sqrt{p\sum _{i=1}^N{\left\| \,Z^2_i\, \right\| }_{p/2}} \end{aligned}$$

and the proof finishes when we note ${\left\| \,Z^2_i\, \right\| }_{p/2} ={\left\| \,Z_1\, \right\| }_{p}^2$ for each index i.$\square $

Now note that the random variables

$$\begin{aligned} H_k:=\frac{h(\xi _k) - {\mathbf {E}}h(\cdot )}{\sigma ^2}\,\,(1\le k\le N) \end{aligned}$$

are i.i.d. and satisfy ${\mathbb {E}}[H_k]=0$, ${\left\| \,H_k\, \right\| }_{p}\le \kappa _p$. Markov’s inequality implies:

$$\begin{aligned} {\mathbb {P}}\left\{ {\widehat{{\mathbf {E}}}}h(\cdot )> 2\sigma ^2\right\} \le {\mathbb {P}}\left\{ \frac{1}{N}\sum _{k=1}^NH_k>1\right\} \le {\left\| \,\frac{1}{N}\sum _{k=1}^NH_k\, \right\| }_{p}^p. \end{aligned}$$

Now use Lemma 5 to bound the RHS.$\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Oliveira, R.I., Thompson, P. Sample average approximation with heavier tails I: non-asymptotic bounds with weak assumptions and stochastic constraints. Math. Program. 199, 1–48 (2023). https://doi.org/10.1007/s10107-022-01810-x

Download citation

Received: 21 November 2017
Accepted: 20 March 2022
Published: 20 April 2022
Issue Date: May 2023
DOI: https://doi.org/10.1007/s10107-022-01810-x

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Sample average approximation with heavier tails I: non-asymptotic bounds with weak assumptions and stochastic constraints

Abstract

Access this article

Similar content being viewed by others

Sample Average Approximation in a Two-Stage Stochastic Linear Program with Quantile Criterion

Sample average approximations of strongly convex stochastic programs in Hilbert spaces

On rates of convergence for sample average approximations in the almost sure sense and in mean

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix

Proof of Lemma 1

Proof of Proposition 1

Lemma 5

Proof of the Lemma

Rights and permissions

About this article

Cite this article

Mathematics Subject Classification

Navigation

Sample average approximation with heavier tails I: non-asymptotic bounds with weak assumptions and stochastic constraints

Abstract

Access this article

Similar content being viewed by others

Sample Average Approximation in a Two-Stage Stochastic Linear Program with Quantile Criterion

Sample average approximations of strongly convex stochastic programs in Hilbert spaces

On rates of convergence for sample average approximations in the almost sure sense and in mean

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix

Appendix

Proof of Lemma 1

Proof of Proposition 1

Lemma 5

Proof of the Lemma

Rights and permissions

About this article

Cite this article

Share this article

Mathematics Subject Classification

Search

Navigation