Abstract
We derive new and improved non-asymptotic deviation inequalities for the sample average approximation (SAA) of an optimization problem. Our results give strong error probability bounds that are “sub-Gaussian” even when the randomness of the problem is fairly heavy tailed. Additionally, we obtain good (often optimal) dependence on the sample size and geometrical parameters of the problem. Finally, we allow for random constraints on the SAA and unbounded feasible sets, which also do not seem to have been considered before in the non-asymptotic literature. Our proofs combine different ideas of potential independent interest: an adaptation of Talagrand’s “generic chaining” bound for sub-Gaussian processes; “localization” ideas from the Statistical Learning literature; and the use of standard conditions in Optimization (metric regularity, Slater-type conditions) to control fluctuations of the feasible set.
Similar content being viewed by others
Notes
Another typical light-tail condition is to assume an sub-exponential tail.
When \({\mathcal {I}}\ne \emptyset \), we assume without too much loss in generality in Assumption 5 that \(\epsilon \in (0,\vartheta ^*]\) with \(\vartheta ^*\) as in Assumption 4. For instance, in case f is locally strongly convex on a neighbourhood U of \(X^*\) and Assumption 4 holds, the existence of a \(x\in X_{-\eta _*}^{*,\vartheta _*}\cap U\) is a mild requirement.
Using \(\sqrt{\epsilon R\delta }\le 2\epsilon +2R\delta \) and \(\epsilon ,\delta \le 1\).
References
Artstein, Z., Wets, R.J.B.: Consistency of minimizers and the SLLN for stochastic programs. J. Convex Anal. 2, 1–17 (1995)
Atlason, J., Epelman, M.A., Henderson, S.G.: Call center staffing with simulation and cutting plane methods. Ann. Oper. Res. 127(1), 333–358 (2004)
Banholzer, D., Fliege, J., Werner, R.: On rates of convergence for sample average approximations in the almost sure sense and in mean. Math. Program. (2019). https://doi.org/10.1007/s10107-019-01400-4
Bartlett, P., Bousquet, O., Mendelson, S.: Local Rademacher complexities. Ann. Stat. 33, 1497–1537 (2005)
Bartlett, P., Mendelson, S.: Empirical minimization. Probab. Theory Relat. Fields 135(3), 311–334 (2006)
Barlett, P.L., Mendelson, S., Neeman, J.: \(\ell _1\)-regularized linear regression: persistence and oracle inequalities. Probab. Theory Relat. Fields 154, 193–224 (2012)
Bauschke, H.H., Borwein, J.M.: On projection algorithms for solving convex feasibility problems. SIAM Rev. 38(3), 367–426 (1996)
Bickel, P.J., Ritov, Y., Tsybakov, A.B.: Simultaneous analysis of the Lasso and Dantzig selector. Ann. Stat. 37(4), 1705–1732 (2009)
Boucheron, S., Lugosi, G., Massart, P.: Concentration Inequalities: A Nonasymptotic Theory of Independence. Oxford University Press, Oxford (2013)
Burke, J.V., Deng, S.: Weak sharp minima revisited, part II: application to linear regularity and error bounds. Math. Program. 104, 235–261 (2005)
Burke, J.V., Ferris, M.C.: Weak sharp minima in mathematical programming. SIAM J. Control. Optim. 31, 1340–1359 (1993)
Dirksen, S.: Tail bounds via generic chaining. Electron. J. Probab. 20, 1–29 (2015)
Dupacovà, J., Wets, R.J.B.: Asymptotic behavior of statistical estimators and of optimal solutions of stochastic optimization problems. Ann. Stat. 16(4), 1517–1549 (1988)
Ermoliev, Y.M., Norkin, V.I.: Sample average approximation for compound stochastic optimization problems. SIAM J. Optim. 23(4), 2231–2263 (2013)
Guigues, V., Juditsky, A., Nemirovski, A.: Non-asymptotic confidence bounds for the optimal value of a stochastic program. Optim. Methods Softw. 32(5), 1033–1058 (2017)
Hoffman, A.J.: On approximate solutions of systems of linear inequalities. J. Res. Natl. Bur. Stand. 49, 263–265 (1952)
Homem-de-Mello, T., Bayraksan, G.: Monte Carlo sampling-based methods for stochastic optimization. Surv. Oper. Res. Manag. Sci. 19, 56–85 (2014)
Iusem, A., Jofré, A., Thompson, P.: Incremental constraint projection methods for monotone stochastic variational inequalities. Math. Oper. Res. 44(1), 236–263 (2018)
Kanková, V., Houda, M.: Thin and heavy tails in stochastic programming. Kybernetika 51(3), 433–456 (2015)
Kanková, V., Omelchenko, V.: Empirical estimates in stochastic programs with probability and second order stochastic dominance constraints. Acta Math. Univ. Comenianae LXXXIV 2, 267–281 (2015)
Koltchinskii, V., Panchenko, D.: Complexities of convex combinations and bounding the generalization error in classification. Ann. Stat. 33, 1455–1496 (2005)
Koltchinskii, V.: Local Rademacher complexities and oracle inequalities in risk minimization. Ann. Stat. 34(6), 2593–2656 (2006)
Kim, S., Pasupathy, R., Henderson, S.G.: A guide to sample average approximation. In: Michael, Fu. (ed.) Handbook of Simulation Optimization, International Series in Operations Research & Management Science, vol. 216, pp. 207–243. Springer, New York (2015)
King, A.J., Rockafellar, R.T.: Asymptotic theory for solutions in statistical estimation and stochastic programming. Math. Oper. Res. 18, 148–162 (1993)
King, A.J., Wets, R.J.B.: Epi-consistency of convex stochastic programs. Stoch. Stoch. Rep. 34, 83–92 (1991)
Kleywegt, A.J., Shapiro, A., Homem-de-Mello, T.: The sample average approximation method for stochastic discrete optimization. SIAM J. Optim. 12(2), 479–502 (2001)
Lojasiewicz, M.S.: Sur le problème de la division. Stud. Math. 18, 87–136 (1959)
Mendelson, S.: Learning without concentration. J. ACM 62(3), 1–25 (2015)
Mendelson, S.: Local vs. global parameters—breaking the Gaussian complexity barrier. Ann. Stat. 45(5), 1835–1862 (2017)
Oliveira, R.I., Thompson, P.: Sample average approximation with heavier tails II: localization in stochastic convex optimization and persistence results for the lasso (2020)
Panchenko, D.: Symmetrization approach to concentration inequalities for empirical processes. Ann. Probab. 31, 2068–2081 (2003)
Pang, J.-S.: Error bounds in mathematical programming. Math. Program. Ser. B 79(1), 299–332 (1997)
Pflug, G.C.: Asymptotic stochastic programs. Math. Oper. Res. 20, 769–789 (1995)
Pflug, G.C.: Stochastic programs and statistical data. Ann. Oper. Res. 85, 59–78 (1999)
Pflug, G.C.: Stochastic optimization and statistical inference. In: Ruszczyński, A., Shapiro, A. (eds.) Handbooks in OR & MS, vol. 10, pp. 427–482. Elsevier, Amsterdam (2003)
Robinson, S.M.: An application of error bounds for convex programming in a linear space. SIAM J. Control 13, 271–273 (1975)
Rockafellar, R.T., Urysaev, S.: Optimization of conditional value-at-risk. J. Risk 2(3), 493–517 (2000)
Royset, J.O.: Optimality functions in stochastic programming. Math. Program. Ser. A 135, 293–321 (2012)
Römisch, W.: Stability of stochastic programming problems. In: Ruszczyński, A., Shapiro, A. (eds.) Handbooks in OR & MS, vol. 10, pp. 483–554. Elsevier, Amsterdam (2003)
Shapiro, A.: Asymptotic properties of statistical estimators in stochastic programming. Ann. Stat. 17, 841–858 (1989)
Shapiro, A.: Asymptotic analysis of stochastic programs. Ann. Oper. Res. 30, 169–186 (1991)
Shapiro, A.: Monte Carlo sampling methods. In: Ruszczyński, A., Shapiro, A. (eds.) Handbooks in OR & MS, vol. 10, pp. 353–425. Elsevier, Amsterdam (2003)
Shapiro, A., Dentcheva, D., Ruszczynski, A.: Lectures on Stochastic Programming: Modeling and Theory. MOS-SIAM Series on Optimization. SIAM, Philadelphia (2009)
Shapiro, A., Homem-de-Mello, T.: On the rate of convergence of optimal solutions of Monte Carlo approximations of stochastic programs. SIAM J. Optim. 11(1), 70–86 (2000)
Shapiro, A., Nemirovski, A.: On the complexity of stochastic programming problems. In: Jeyakumar, V., Rubinov, A. (eds.) Continuous Optimization: Current Trends and Modern Applications, vol. 99, pp. 111–146. Springer, Boston (2005)
Talagrand, M.: Upper and Lower Bounds for Stochastic Processes. Springer, Berlin (2014)
Talagrand, M.: Sharper bounds for Gaussian and empirical processes. Ann. Probab. 22, 28–76 (1994)
Tibshirani, R.: Regression shrinkage and selection via the Lasso. J. R. Stat. Soc. Ser. B 58, 267–288 (1996)
Vogel, S.: Stability results for stochastic programming problems. Optimization 19(2), 269–288 (1998)
Vogel, S.: Confidence Sets and Convergence of Random Functions (2008) preprint at https://www.tu-ilmenau.de/fileadmin/media/orsto/vogel/Publikationen/Vogel-Grecksch-Geb-korr-1.pdf
Vogel, S.: Universal confidence sets for solutions of optimization problems. SIAM J. Optim. 19(3), 1467–1488 (2008)
Wand, W., Ahmed, S.: Sample average approximation of expected value constrained stochastic programs. Oper. Res. Lett. 36, 515–519 (2008)
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Roberto I. Oliveira is supported by CNPq grants 432310/2018-5 (Universal) and 304475/2019-0 (Produtividade em Pesquisa), and FAPERJ grants 202.668/2019 (Cientista do Nosso Estado) and 290.024/2021 (Edital Inteligência Artificial)
Appendix
Appendix
Proof of Lemma 1
The second statement in the Lemma is a direct consequence of the first. Therefore, we will only prove the first statement.
Assume that \(Z'_1,\dots ,Z'_n\) are independent copies of the \(Z_1,\dots ,Z_n\). Also let \(Z=(Z_1,\dots ,Z_n)^T\). What we want to prove is that, for any \(t\ge 0\),
By [31, Corollary 1], it suffices to prove that, for any \(t\ge 0\),
We will prove that the above inequality holds almost surely conditionally on values \(|Z_k-Z'_k|=a_k\), \(1\le k\le N\). Notice that, conditionally on these values,
where the \(u_k\) are i.i.d. unbiased random signs. So what we must show is that:
for any choice of \(a_k\), \(1\le k\le N\). This follows easily from the standard inequalities:
and Bernstein’s trick:
\(\square \)
Proof of Proposition 1
We will need the following Lemma.
Lemma 5
There exists a constant \({\mathbf{c}}_{{\mathsf {bdg}}}\) such that, for all \(p\ge 2\) and all i.i.d. random variables \(Z_1,\dots ,Z_N\in L^p\) with \({\mathbb {E}}[Z_i]=0\),
Proof of the Lemma
By the Burkholder–Davis–Gundy inequality and the subaditivity of the \(L^{p/2}\) norm:
and the proof finishes when we note \({\left\| \,Z^2_i\, \right\| }_{p/2} ={\left\| \,Z_1\, \right\| }_{p}^2\) for each index i.\(\square \)
Now note that the random variables
are i.i.d. and satisfy \({\mathbb {E}}[H_k]=0\), \({\left\| \,H_k\, \right\| }_{p}\le \kappa _p\). Markov’s inequality implies:
Now use Lemma 5 to bound the RHS.\(\square \)
Rights and permissions
About this article
Cite this article
Oliveira, R.I., Thompson, P. Sample average approximation with heavier tails I: non-asymptotic bounds with weak assumptions and stochastic constraints. Math. Program. 199, 1–48 (2023). https://doi.org/10.1007/s10107-022-01810-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10107-022-01810-x