Inexact stochastic mirror descent for two-stage nonlinear stochastic programs

Guigues, Vincent

doi:10.1007/s10107-020-01490-5

Inexact stochastic mirror descent for two-stage nonlinear stochastic programs

Full Length Paper
Series A
Published: 02 April 2020

Volume 187, pages 533–577, (2021)
Cite this article

Mathematical Programming Submit manuscript

Vincent Guigues¹

428 Accesses
8 Citations
Explore all metrics

Abstract

We introduce an inexact variant of stochastic mirror descent (SMD), called inexact stochastic mirror descent (ISMD), to solve nonlinear two-stage stochastic programs where the second stage problem has linear and nonlinear coupling constraints and a nonlinear objective function which depends on both first and second stage decisions. Given a candidate first stage solution and a realization of the second stage random vector, each iteration of ISMD combines a stochastic subgradient descent using a prox-mapping with the computation of approximate (instead of exact for SMD) primal and dual second stage solutions. We provide two convergence analysis of ISMD, under two sets of assumptions. The first convergence analysis is based on the formulas for inexact cuts of value functions of convex optimization problems shown recently in Guigues (SIAM J. Optim. 30(1), 407–438, 2020). The second convergence analysis provides a convergence rate (the same as SMD) and relies on new formulas that we derive for inexact cuts of value functions of convex optimization problems assuming that the dual function of the second stage problem for all fixed first stage solution and realization of the second stage random vector, is strongly concave. We show that this assumption of strong concavity is satisfied for some classes of problems and present the results of numerical experiments on two simple two-stage problems which show that solving approximately the second stage problem for the first iterations of ISMD can help us obtain a good approximate first stage solution quicker than with SMD.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Mann-type extragradient algorithm for solving variational inequality and fixed point problems

Article 29 May 2024

Efficiency of higher-order algorithms for minimizing composite functions

Article 10 October 2023

Random Gradient-Free Minimization of Convex Functions

Article 30 November 2015

Notes

Using the equivalence between norms in \(\mathbb {R}^n\), we can derive a valid constant of strong concavity for other norms, for instance \(\Vert \cdot \Vert _{\infty }\) and \(\Vert \cdot \Vert _1\).
Note that we used (A4) to ensure that \(x(\lambda _*, \mu _*) = x_*\), which is also used in the proof of Theorem 10 in [19].
Note that (H1) and (H2) imply the convexity of \(\mathcal {Q}\) given by (3.1). Indeed, let \(x_1, x_2 \in X\), \(0 \le t \le 1\), and \(y_1 \in S( x_1), y_2 \in S( x_2),\) such that \(\mathcal {Q}( x_1)=f(y_1, x_1)\) and \(\mathcal {Q}( x_2)=f(y_2, x_2)\). By convexity of g and Y, we have that have \(ty_1 +(1-t)y_2 \in S(tx_1 + (1-t)x_2)\) and therefore \(\mathcal {Q}(tx_1 +(1-t)x_2) \le f(ty_1 + (1-t)y_2,t x_1 + (1-t)x_2) \le t f(y_1,x_1) +(1-t)f(y_2,x_2)=t\mathcal {Q}(x_1)+(1-t)\mathcal {Q}(x_2)\) where for the last inequality we have used the convexity of f.
The proof is similar to the proof of Proposition 4.6 in [6].
According to current Mosek documentation, it is not possible to use absolute errors. Therefore, early termination of the solver can either be obtained limiting the number of iterations or defining relative errors.
The deterministic equivalents of these instances are already large size quadratic programs. For instance, for \(n=10\), the deterministic equivalent of Problem (5.3)–(5.4) is a quadratically constrained quadratic program with 200,010 variables and 200,001 quadratic constraints.
Naturally, after running \(t-1\) of the \(N-1\) total iterations, the approximate optimal value computed by SMD is \(\displaystyle \frac{1}{\sum _{\tau =1}^t \gamma _\tau (N)} \sum \nolimits _{\tau =1}^t \gamma _\tau ( N) \Big ( f_1(x_1^{N,\tau }) + f_2( x_2^{N,\tau }, x_1^{N,\tau }, \xi _2^{N,\tau }) \Big )\) obtained on the basis of sample \(\xi _2^{N,1},\ldots ,\xi _2^{N,t}\) of \(\xi _2\).
Due to the increase in computational time when N increases, we do not take the largest sample size \(N=2000\) for all instances. However, for all instances and values of N chosen, we observe a stabilization of the approximate optimal value before stopping the algorithm, which indicates a good solution has been found at termination.
When SMD (and similarly for ISMD) is run on samples of \(\xi _2\) of size N, we have seen how to compute at iteration \(t-1\) an estimation \(\displaystyle \frac{1}{\sum _{\tau =1}^t \gamma _\tau (N)} \sum \nolimits _{\tau =1}^t \gamma _\tau ( N) \Big ( f_1(x_1^{N,\tau }) + f_2( x_2^{N,\tau }, x_1^{N,\tau }, \xi _2^{N,\tau }) \Big )\) of the optimal value on the basis of sample \(\xi _2^{N,1},\ldots ,\xi _2^{N,t}\) of \(\xi _2\). The mean approximate optimal value after \(t-1\) iterations is obtained running SMD on 10 independent samples of \(\xi _2\) of size N and computing the mean of these values on these samples.

References

Andersen, E.D., Andersen, K.D.: The MOSEK optimization toolbox for MATLAB manual. Version 7.0, (2013). https://www.mosek.com/
Birge, J., Louveaux, F.: Introduction to Stochastic Programming. Springer, New York (1997)
MATH Google Scholar
Dantzig, G.B., Glynn, P.W.: Parallel processors for planning under uncertainty. Ann. Oper. Res. 22, 1–21 (1990)
Article MathSciNet Google Scholar
Guigues, V.: Convergence analysis of sampling-based decomposition methods for risk-averse multistage stochastic convex programs. SIAM J. Optim. 26, 2468–2494 (2016)
Article MathSciNet Google Scholar
Guigues, V.: Multistep stochastic mirror descent for risk-averse convex stochastic programs based on extended polyhedral risk measures. Math. Program. 163, 169–212 (2017)
Article MathSciNet Google Scholar
Guigues, V.: Inexact cuts in Stochastic Dual Dynamic Programming. SIAM J. Optim. 30(1), 407–438 (2020)
Article MathSciNet Google Scholar
Hiriart-Urruty, J.-B., Lemaréchal, C.: Convex Analysis and Minimization Algorithms I. Springer, Berlin (1996)
MATH Google Scholar
Infanger, G.: Monte Carlo (importance) sampling within a benders decomposition algorithm for stochastic linear programs. Ann. Oper. Res. 39, 69–95 (1992)
Article MathSciNet Google Scholar
Juditsky, A., Nesterov, Y.: Primal–dual subgradient methods for minimizing uniformly convex functions. arXiv:1401.1792 (2010)
Lan, G., Nemirovski, A., Shapiro, A.: Validation analysis of mirror descent stochastic approximation method. Math. Program. 134, 425–458 (2012)
Article MathSciNet Google Scholar
Lan, G., Zhou, Z.: Dynamic stochastic approximation for multi-stage stochastic optimization. arXiv (2017)
Lemaréchal, C., Nemirovski, A., Nesterov, Y.: New variants of bundle methods. Math. Program. 69, 111–148 (1995)
Article MathSciNet Google Scholar
Nemirovski, A., Juditsky, A., Lan, G., Shapiro, A.: Robust stochastic approximation approach to stochastic programming. SIAM J. Optim. 19, 1574–1609 (2009)
Article MathSciNet Google Scholar
Pereira, M.V.F., Pinto, L.M.V.G.: Multi-stage stochastic optimization applied to energy planning. Math. Program. 52, 359–375 (1991)
Article MathSciNet Google Scholar
Polyak, B.T., Juditsky, A.: Acceleration of stochastic approximation by averaging. SIAM J. Control Optim. 30, 838–855 (1992)
Article MathSciNet Google Scholar
Rockafellar, R.T., Wets, R.J.B.: Variational Analysis. Grundlehren der Mathematischen Wissenschaften. Springer, New York (1997)
MATH Google Scholar
Ruszczyński, A.: A multicut regularized decomposition method for minimizing a sum of polyhedral functions. Math. Program. 35, 309–333 (1986)
Article Google Scholar
Shapiro, A., Dentcheva, D., Ruszczyński, A.: Lectures on Stochastic Programming: Modeling and Theory. SIAM, Philadelphia (2009)
Book Google Scholar
Yu, H., Neely, J.: On the convergence time of the drift-plus-penalty algorithm for strongly convex programs. arXiv:1503.06235 (2015)

Download references

Acknowledgements

The author’s research was partially supported by an FGV Grant, CNPq Grant 311289/2016-9, and FAPERJ Grant E-26/201.599/2014. The author would like to thank Alberto Seeger for helpful discussions.

Author information

Authors and Affiliations

School of Applied Mathematics, FGV, Praia de Botafogo, Rio de Janeiro, Brazil
Vincent Guigues

Authors

Vincent Guigues
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Vincent Guigues.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

See Figs. 6, 7 and Table 5.

Table 5 Maximal number of iterations for Mosek interior point solver used to solve second stage problems as a function of the iteration number \(i=1,\ldots ,N\), of ISMD and the maximal number of iterations \(I_{\max }\) allowed for Mosek solver to solve subproblems with SMD

Full size table

Rights and permissions

Reprints and permissions

About this article

Cite this article

Guigues, V. Inexact stochastic mirror descent for two-stage nonlinear stochastic programs. Math. Program. 187, 533–577 (2021). https://doi.org/10.1007/s10107-020-01490-5

Download citation

Received: 30 May 2018
Accepted: 06 March 2020
Published: 02 April 2020
Issue Date: May 2021
DOI: https://doi.org/10.1007/s10107-020-01490-5

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Inexact stochastic mirror descent for two-stage nonlinear stochastic programs

Abstract

Access this article

Similar content being viewed by others

Mann-type extragradient algorithm for solving variational inequality and fixed point problems

Efficiency of higher-order algorithms for minimizing composite functions

Random Gradient-Free Minimization of Convex Functions

Notes

References

Acknowledgements