Skip to main content
Log in

Sample average approximation with heavier tails II: localization in stochastic convex optimization and persistence results for the Lasso

  • Full Length Paper
  • Series A
  • Published:
Mathematical Programming Submit manuscript

Abstract

“Localization” has proven to be a valuable tool in the Statistical Learning literature as it allows sharp risk bounds in terms of the problem geometry. Localized bounds seem to be much less exploited in the stochastic optimization literature. In addition, there is an obvious interest in both communities in obtaining risk bounds that require weak moment assumptions or “heavier-tails”. In this work we use a localization toolbox to derive risk bounds in two specific applications. The first is in portfolio risk minimization with conditional value-at-risk constraints. We consider a setting where, among all assets with high returns, there is a portion of dimension g, unknown to the investor, that has significant less risk than the other remaining portion. Our rates for the SAA problem show that “risk inflation”, caused by a multiplicative factor, affects the statistical rate only via a term proportional to g. As the “normalized risk” increases, the contribution in the rate from the extrinsic dimension diminishes while the dependence on g is kept fixed. Localization is a key tool to show this property. As a second application of our localization toolbox, we obtain sharp oracle inequalities for least-squares estimators with a Lasso-type constraint under weak moment assumptions. One main consequence of these inequalities is to obtain persistence, as posed by Greenshtein and Ritov, with covariates having heavier tails. This gives improvements in prior work of Bartlett, Mendelson and Neeman.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Notes

  1. Actually, the Corollary 2.5 in [29] cover only the case when \(\varvec{\Sigma }\) is the identity matrix, but the arguments are based on VC dimension theory that are readily extendable to our setting. We omit such details.

  2. See also Corollary 2.5, item (2) of [29] for essentially the same statement. See also [36].

References

  1. Artstein, Z., Wets, R.J.-B.: Consistency of minimizers and the SLLN for stochastic programs. J. Convex Anal. 2, 1–17 (1995)

    MathSciNet  MATH  Google Scholar 

  2. Bartlett, P., Bousquet, O., Mendelson, S.: Local Rademacher complexities. Ann. Stat. 33, 1497–1537 (2005)

    MathSciNet  MATH  Google Scholar 

  3. Bartlett, P., Mendelson, S.: Empirical minimization. Probab. Theory Rel. Fields 135(3), 311–334 (2006)

    MathSciNet  MATH  Google Scholar 

  4. Barlett, P.L., Mendelson, S., Neeman, J.: \(\ell _1\)-regularized linear regression: persistence and oracle inequalities. Probab. Theory Relat. Fields 154, 193–224 (2012)

    MATH  Google Scholar 

  5. Bickel, P.J., Ritov, Y., Tsybakov, A.B.: Simultaneous analysis of the Lasso and Dantzig selector. Ann. Stat. 37(4), 1705–1732 (2009)

    MathSciNet  MATH  Google Scholar 

  6. Bellec, P.C., Lecué, G., Tsybakov, A.B.: Slope meets lasso: improved oracle bounds and optimality. Ann. Stat. 46(6B), 3603–3642 (2018)

    MathSciNet  MATH  Google Scholar 

  7. Bunea, F., Tsybakov, A.B., Wegkamp, M.H.: Sparsity oracle inequalities for the Lasso. Electron. J. Stat. 1, 169–194 (2007)

    MathSciNet  MATH  Google Scholar 

  8. Bunea, F., Tsybakov, A.B., Wegkamp, M.H.: Aggregation for Gaussian regression. Ann. Stat. 35(4), 1674–1697 (2007)

    MathSciNet  MATH  Google Scholar 

  9. Bunea, F., Tsybakov, A.B., Wegkamp, M.H.: Sparse density estimation with \(\ell _1\) penalties. In: Bshouty, N.H., Gentile, C. (Eds.) Learning Theory. COLT 2007. Lecture Notes in Computer Science, vol. 4539. Springer, Berlin (2007)

  10. Bunea, F., Tsybakov, A.B., Wegkamp, M.H.: Aggregation and sparsity via \(\ell _1\)-penalized least squares. In: Lugosi, G., Simon, H.U. (Eds.) Learning Theory. COLT 2006. Lecture Notes in Computer Science, vol. 4005. Springer, Berlin (2006)

  11. Bunea, F., Tsybakov, A.B., Wegkamp, M.H.: Aggregation for regression learning. Preprint: arXiv.org/abs/math/0410214 (2004)

  12. Candes, E., Tao, T.: The Dantzig selector: statistical estimation when p is much larger than n. Ann. Stat. 35(6), 2313–2351 (2007)

    MathSciNet  MATH  Google Scholar 

  13. Dupacovà, J., Wets, R.J.-B.: Asymptotic behavior of statistical estimators and of optimal solutions of stochastic optimization problems. Ann. Stat. 16(4), 1517–1549 (1988)

    MathSciNet  MATH  Google Scholar 

  14. Guigues, V., Juditsky, A., Nemirovski, A.: Non-asymptotic confidence bounds for the optimal value of a stochastic program. Optim. Methods Softw. 32(5), 1033–1058 (2017)

    MathSciNet  MATH  Google Scholar 

  15. Greenshtein, E.: Best subset selection, persistence in high-dimensional statistical learning and optimization under \(\ell _1\) constraint. Ann. Stat. 34(5), 2367–2386 (2006)

    MATH  Google Scholar 

  16. Greenshtein, E., Ritov, Y.: Persistence in high-dimensional linear predictor selection and the virtue of overparametrization. Bernoulli 10(6), 971–988 (2004)

    MathSciNet  MATH  Google Scholar 

  17. Homem-de-Mello, T., Bayraksan, G.: Monte Carlo sampling-based methods for stochastic optimization. Surv. Oper. Res. Manag. Sci. 19, 56–85 (2014)

    MathSciNet  Google Scholar 

  18. Iusem, A.N., Jofré, A., Thompson, P.: Incremental constraint projection methods for monotone stochastic variational inequalities. Math. Oper. Res. 44(1), 236–263 (2019)

    MathSciNet  MATH  Google Scholar 

  19. Kim, S., Pasupathy, R., Henderson, S.G.: A guide to sample average approximation. In: Michael, Fu. (ed.) Handbook of Simulation Optimization, International Series in Operations Research & Management Science, vol. 216, pp. 207–243. Springer, New York (2015)

    Google Scholar 

  20. King, A.J., Rockafellar, R.T.: Asymptotic theory for solutions in statistical estimation and stochastic programming. Math. Oper. Res. 18, 148–162 (1993)

    MathSciNet  MATH  Google Scholar 

  21. King, A.J., Wets, R.J.-B.: Epi-consistency of convex stochastic programs. Stoch. Stoch. Rep. 34, 83–92 (1991)

    MathSciNet  MATH  Google Scholar 

  22. Koltchinskii, V.: Oracle Inequalities in Empirical Risk Minimization and Sparse Recovery Problems. Lecture Notes in Mathematics book series (LNM, volume 2033), Ecole d’Eté Probabilit. Saint-Flour Book Sub Series (LNMECOLE, volume 2033). Springer, Berlin (2011)

  23. Koltchinskii, V., Lounici, K., Tsybakov, A.B.: Nuclear-norm penalization and optimal rates for noisy low-rank matrix completion. Ann. Stat. 39(5), 2302–2329 (2011)

    MathSciNet  MATH  Google Scholar 

  24. Koltchinskii, V.: The Dantzig selector and sparsity oracle inequalities. Bernoulli 15(3), 799–828 (2009)

    MathSciNet  MATH  Google Scholar 

  25. Koltchinskii, V.: Sparsity in penalized empirical risk minimization. Ann. Inst. H. Poincaré Probab. Stat. 45(1), 7–57 (2009)

    MathSciNet  MATH  Google Scholar 

  26. Koltchinskii, V.: Sparse recovery in convex hulls via entropy penalization. Ann. Stat. 37(3), 1332–1359 (2009)

    MathSciNet  MATH  Google Scholar 

  27. Koltchinskii, V.: Local Rademacher complexities and oracle inequalities in risk minimization. Ann. Stat. 34(6), 2593–2656 (2006)

    MathSciNet  MATH  Google Scholar 

  28. Lecué, G., Mendelson, S.: General nonexact oracle inequalities for classes with subexponential envelope. Ann. Stat. 40(2), 832–860 (2012)

    MathSciNet  MATH  Google Scholar 

  29. Lecué, G., Mendelson, S.: Sparse recovery under weak moment assumptions. J. Eur. Math. Soc. 19, 881–904 (2017)

    MathSciNet  MATH  Google Scholar 

  30. Leng, C., Lin, Y., Wahba, G.: A note on the lasso and related procedures in model selection. Stat. Sin. 16, 1273–1284 (2006)

    MathSciNet  MATH  Google Scholar 

  31. Lounici, K.: Sup-norm convergence rate and sign concentration property of Lasso and Dantzig estimators. Electron. J. Stat. 2, 90–102 (2008)

    MathSciNet  MATH  Google Scholar 

  32. Meinshausen, N., Yu, B.: Lasso-type recovery of sparse representations for high-dimensional data. Ann. Stat. 37(1), 246–270 (2009)

    MathSciNet  MATH  Google Scholar 

  33. Meinhausen, N.: Relaxed lasso. Comput. Stat. Data Anal. 52(1), 374–393 (2007)

    MathSciNet  Google Scholar 

  34. Meinhausen, N., Bühlmann, P.: High-dimensional graphs and variable selection with the Lasso. Ann. Stat. 34(3), 1436–1462 (2006)

    MathSciNet  MATH  Google Scholar 

  35. Oliveira, R.I.: The lower tail of random quadratic forms, with applications to ordinary least squares and restricted eigenvalue properties (2013), preprint at arXiv:1312.2903

  36. Oliveira, R.I.: The lower tail of random quadratic forms with applications to ordinary least squares. Probab. Theory Relat. Fields 166, 1175–1194 (2016)

    MathSciNet  MATH  Google Scholar 

  37. Oliveira, R.I., Thompson, P.: Sample average approximation with heavier tails I: non-asymptotic bounds with weak assumptions and stochastic constraints. Math. Program. (2022). https://doi.org/10.1007/s10107-022-01810-x

    Article  Google Scholar 

  38. Pflug, G.C.: Asymptotic stochastic programs. Math. Oper. Res. 20, 769–789 (1995)

    MathSciNet  MATH  Google Scholar 

  39. Panchenko, D.: Symmetrization approach to concentration inequalities for empirical processes. Ann. Probab. 31, 2068–2081 (2003)

    MathSciNet  MATH  Google Scholar 

  40. Pang, J.-S.: Error bounds in mathematical programming. Math. Program. Ser. B 79(1), 299–332 (1997)

    MathSciNet  MATH  Google Scholar 

  41. Pflug, G.C.: Stochastic programs and statistical data. Ann. Oper. Res. 85, 59–78 (1999)

    MathSciNet  MATH  Google Scholar 

  42. Pflug, G.C.: Stochastic optimization and statistical inference. In: Ruszczyński, A., Shapiro, A. (eds.) Handbooks in OR & MS, vol. 10, pp. 427–482. Elsevier (2003)

  43. Rockafellar, R.T., Urysaev, S.: Optimization of conditional value-at-risk. J. Risk 2(3), 493–517 (2000)

    Google Scholar 

  44. Römisch, W.: Stability of stochastic programming problems. In: Ruszczyński, A., Shapiro, A. (eds.) Handbooks in OR & MS, vol. 10, pp. 483–554. Elsevier (2003)

  45. Shapiro, A.: Asymptotic properties of statistical estimators in stochastic programming. Ann. Stat. 17, 841–858 (1989)

    MathSciNet  MATH  Google Scholar 

  46. Shapiro, A.: Asymptotic analysis of stochastic programs. Ann. Oper. Res. 30, 169–186 (1991)

    MathSciNet  MATH  Google Scholar 

  47. Shapiro, A.: Monte Carlo sampling methods. In: Ruszczyński, A., Shapiro, A. (eds.) Handbooks in OR & MS, vol. 10, pp. 353–425. Elsevier (2003)

  48. Shapiro, A., Dentcheva, D., Ruszczynski, A.: Lectures on Stochastic Programming: Modeling and Theory. MOS-SIAM Series Optimization. SIAM, Philadelphia (2009)

  49. Talagrand, M.: Sharper bounds for Gaussian and empirical processes. Ann. Probab. 22, 28–76 (1994)

    MathSciNet  MATH  Google Scholar 

  50. Talagrand, M.: Upper and Lower Bounds for Stochastic Processes. Springer (2014)

  51. Tibshirani, R.: Regression shrinkage and selection via the Lasso. J. R. Stat. Soc. Ser. B 58, 267–288 (1996)

    MathSciNet  MATH  Google Scholar 

  52. van de Geer, S.A.: High-dimensional generalized linear models and the Lasso. Ann. Stat. 36(2), 614–645 (2008)

    MathSciNet  MATH  Google Scholar 

  53. Zhang, C.-H., Huang, J.: The sparsity and the bias of the lasso selection in high-dimensional linear regression. Ann. Stat. 36(4), 1567–1594 (2008)

    MathSciNet  MATH  Google Scholar 

  54. Zhao, P., Yu, B.: On model selection consistency of Lasso. J. Mach. Learn. Res. 7, 2541–2563 (2006)

    MathSciNet  MATH  Google Scholar 

  55. Zhang, T.: Some sharp performance bounds for least squares regression with L1 regularization. Ann. Stat. 37(5A), 2109–2144 (2009)

    MATH  Google Scholar 

  56. Zou, H.: The adaptive lasso and its oracle properties. J. Am. Stat. Assoc. 101(476), 1418–1429 (2006)

    MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

Roberto I. Oliveira has been funded by FAPESP. Philip Thompson was funded by the grant STAR - F.10005389.06.001 by Krannert School of Management, Purdue University.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Philip Thompson.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

of Proposition 1

Given admissible sequences \(\{{\mathcal {A}}_{1,j}\}_{j\ge 0}\) and \(\{{\mathcal {A}}_{2,j}\}_{j\ge 0}\) for \({\mathcal {M}}_1\) and \({\mathcal {M}}_2\), one may define an admissible sequence \(\{{\mathcal {C}}_j\}_{j\ge 0}\) for \({\mathcal {M}}\) via:

$$\begin{aligned}{\mathcal {C}}_0:=\{{\mathcal {M}}\}\hbox { and }{\mathcal {C}}_j:= {\mathcal {A}}_{1,j-1}\times {\mathcal {A}}_{2,j-1}\,(j\ge 1).\end{aligned}$$

It is easy to see that this is indeed admissible and moreover

$$\begin{aligned}\text {\textsf{diam}}({\mathcal {C}}_0) = \text {\textsf{diam}}({\mathcal {M}})= \text {\textsf{diam}}({\mathcal {M}}_1) + \text {\textsf{diam}}({\mathcal {M}}_2),\\\text {\textsf{diam}}({\mathcal {C}}_j)\le \text {\textsf{diam}}({\mathcal {A}}_{1,j-1}) + \text {\textsf{diam}}({\mathcal {A}}_{2,j-1}).\end{aligned}$$

Therefore,

$$\begin{aligned}\gamma _2^{(\alpha )}({\mathcal {M}})\le \text {\textsf{diam}}({\mathcal {M}})^\alpha + \sum _{j\ge 1}2^{j/2}(\text {\textsf{diam}}({\mathcal {A}}_{1,j-1})^\alpha + \text {\textsf{diam}}({\mathcal {A}}_{2,j-1})^\alpha )\end{aligned}$$

or equivalently

$$\begin{aligned}\gamma _2^{(\alpha )}({\mathcal {M}})\le & {} \text {\textsf{diam}}({\mathcal {M}}_1)^\alpha + \text {\textsf{diam}}({\mathcal {M}}_2)^\alpha \\ {}{} & {} + \sqrt{2}\left( \sum _{j\ge 0}2^{j/2}\text {\textsf{diam}}({\mathcal {A}}_{1,j-1})^\alpha \right) +\sqrt{2}\left( \sum _{j\ge 0}2^{j/2}\text {\textsf{diam}}({\mathcal {A}}_{2,j-1})^\alpha \right) .\end{aligned}$$

The proof finishes when we note that \(\text {\textsf{diam}}({\mathcal {M}})^{\alpha }\le \gamma ^{(\alpha )}_2({\mathcal {M}})\) and take the infimum over admissible sequences. \(\square \)

We recall the following fundamental result due to Panchenko. It establishes a sub-Gaussian tail for the deviation of an heavy-tailed empirical process around its mean after a proper self-normalization by a random quantity \({{\widehat{V}}}\).

Theorem 4

(Panchenko’s inequality [39]) Let \({\mathcal {F}}\) be a finite family of measurable functions \(g:\Xi \rightarrow {\mathbb {R}}\) such that \({\textbf{P}}g^2(\cdot )<\infty \). Let also \(\{\xi _j\}_{j=1}^N\) and \(\{\eta _j\}_{j=1}^N\) be both i.i.d. samples drawn from a distribution \({\textbf{P}}\) over \(\Xi \) which are independent of each other. Define

$$\begin{aligned} {\textsf{S}}:=\sup _{g\in {\mathcal {F}}}\sum _{j=1}^Ng(\xi _j),\quad \quad \hbox {and}\quad \quad {\widehat{V}}:={\mathbb {E}}\left\{ \sup _{g\in {\mathcal {F}}}\sum _{j=1}^N\left[ g(\xi _j)-g(\eta _j)\right] ^2\Bigg |\sigma (\xi _1,\ldots ,\xi _N)\right\} . \end{aligned}$$

Then, for all \(t>0\),

$$\begin{aligned} {\mathbb {P}}\left\{ {\textsf{S}}-{\mathbb {E}}[{\textsf{S}}]\ge \sqrt{\frac{2(1+t)}{N}{\widehat{V}}}\right\} \bigvee {\mathbb {P}}\left\{ {\textsf{S}}-{\mathbb {E}}[{\textsf{S}}]\le -\sqrt{\frac{2(1+t)}{N}{\widehat{V}}}\right\} \le 2e^{-t}. \end{aligned}$$

The following result is a direct consequence of Theorem 4 applied to the unitary class \({\mathcal {F}}:=\{g\}\). It provides a sub-Gaussian tail for any random variable with finite 2nd moment in terms its variance and empirical variance.

Lemma 8

(Sub-Gaussian tail for self-normalized sums) Suppose \(\{\xi _j\}_{j=1}^N\) is i.i.d. sample of a distribution \({\textbf{P}}\) over \(\Xi \) and denote by \({{\widehat{{\textbf{P}}}}}\) the correspondent empirical distribution. Then for any measurable function \(g:\Xi \rightarrow {\mathbb {R}}\) satisfying \({\textbf{P}}g(\cdot )^2<\infty \) and, for any \(t>0\),

$$\begin{aligned}{} & {} {\mathbb {P}}\left\{ ({{\widehat{{\textbf{P}}}}}-{\textbf{P}})g(\cdot )\ge \sqrt{\frac{2(1+t)}{N}\left( {{\widehat{{\textbf{P}}}}}+{\textbf{P}}\right) \left[ g(\cdot )-{\textbf{P}}g(\cdot )\right] ^2}\right\} \le 2e^{-t},\\{} & {} {\mathbb {P}}\left\{ ({{\widehat{{\textbf{P}}}}}-{\textbf{P}})g(\cdot )\le -\sqrt{\frac{2(1+t)}{N}\left( {{\widehat{{\textbf{P}}}}}+{\textbf{P}}\right) \left[ g(\cdot )-{\textbf{P}}g(\cdot )\right] ^2}\right\} \le 2e^{-t}. \end{aligned}$$

Finally, we present the sub-gaussian tail of nonnegative random variables.

Lemma 9

(Sub-Gaussian lower tail for nonnegative random variables) Let \(\{Z_j\}_{j=1}^N\) be i.i.d. nonnegative random variables. Assume \(a\in (1,2]\) and \(0<{\mathbb {E}}[Z_1^a]<\infty \). Then, for all \(\epsilon >0\),

$$\begin{aligned} {\mathbb {P}}\left\{ \frac{1}{N}\sum _{j=1}^NZ_j\le (1-\epsilon ){\mathbb {E}}[Z_1]\right\} \le \exp \left\{ -\left( \frac{a-1}{a}\right) \epsilon ^{\frac{a-1}{a}}\left\{ \frac{({\mathbb {E}}[Z_1])^a}{{\mathbb {E}}[Z_1^a]}\right\} ^{\frac{1}{a-1}}N\right\} . \end{aligned}$$

Proof

Let \(\theta ,\epsilon >0\). By the usual “Bernstein trick”, we get

$$\begin{aligned} {\mathbb {P}}\left\{ \frac{1}{N}\sum _{j=1}^NZ_j\le (1-\epsilon ){\mathbb {E}}[Z_1]\right\}\le & {} {\mathbb {P}}\left\{ \sum _{j=1}^N({\mathbb {E}}[Z_i]-Z_i)\ge \epsilon {\mathbb {E}}[Z_1]N\right\} \nonumber \\\le & {} {\mathbb {P}}\left\{ e^{\theta \sum _{j=1}^N({\mathbb {E}}[Z_i]-Z_i)}\ge e^{\theta \epsilon {\mathbb {E}}[Z_1]N}\right\} \nonumber \\\le & {} e^{-\theta \epsilon {\mathbb {E}}[Z_1]N}{\mathbb {E}}\left[ e^{\theta \sum _{j=1}^N({\mathbb {E}}[Z_i]-Z_i)}\right] \nonumber \\= & {} e^{-\theta \epsilon {\mathbb {E}}[Z_1]N}{\mathbb {E}}\left[ e^{\theta ({\mathbb {E}}[Z_1]-Z_1)}\right] ^N. \end{aligned}$$
(57)

It is a simple calculus exercise to show that \( \forall x\ge 0,e^{-x}\le 1-x+\frac{x^a}{a}. \) Applying this with \(x:=\theta Z_1\), we obtain

$$\begin{aligned} {\mathbb {E}}\left[ e^{\theta ({\mathbb {E}}[Z_1]-Z_1)}\right]\le & {} e^{\theta {\mathbb {E}}[Z_1]}\left( 1-{\mathbb {E}}[\theta Z_1]+\frac{{\mathbb {E}}[(\theta Z_1)^a]}{a}\right) \\\le & {} e^{\theta {\mathbb {E}}[Z_1]}e^{-\theta {\mathbb {E}}[Z_1]+\frac{{\mathbb {E}}[(\theta Z_1)^a]}{a}}=e^{\frac{{\mathbb {E}}[(\theta Z_1)^a]}{a}}, \end{aligned}$$

where the second inequality follows from the relation \(1+x\le e^x\) for all \(x\in {\mathbb {R}}\). We plug this back into (57) and get, for all \(\theta >0\),

$$\begin{aligned} {\mathbb {P}}\left\{ \frac{1}{N}\sum _{j=1}^NZ_j\le (1-\epsilon ){\mathbb {E}}[Z_1]\right\}\le & {} e^{\left( -\theta \epsilon {\mathbb {E}}[Z_1]+\theta ^a\frac{{\mathbb {E}}[Z_1^a]}{a}\right) N}. \end{aligned}$$
(58)

Since \(a\in (1,2]\), we may actually minimize the above bound over \(\theta >0\). The minimum is attained at \( \theta _*:=\left( \frac{\epsilon {\mathbb {E}}[Z_1]}{{\mathbb {E}}[Z_1^a]}\right) ^{\frac{1}{a-1}}. \) To finish the proof, we plug this in (58) and notice that

$$\begin{aligned} -\theta _*\epsilon {\mathbb {E}}[Z_1]+\theta _*^a\frac{{\mathbb {E}}[Z_1^a]}{a}=\left( -1+\frac{1}{a}\right) \frac{(\epsilon {\mathbb {E}}[Z_1])^{\frac{a}{a-1}}}{{\mathbb {E}}[Z_1^a]^{\frac{1}{a-1}}}, \end{aligned}$$

using that \(1+\frac{1}{a-1}=\frac{a}{a-1}\). \(\square \)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Oliveira, R.I., Thompson, P. Sample average approximation with heavier tails II: localization in stochastic convex optimization and persistence results for the Lasso. Math. Program. 199, 49–86 (2023). https://doi.org/10.1007/s10107-023-01940-w

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10107-023-01940-w

Mathematics Subject Classification

Navigation