Skip to main content
Log in

Randomized Progressive Hedging methods for multi-stage stochastic programming

  • Original Research
  • Published:
Annals of Operations Research Aims and scope Submit manuscript

Abstract

Progressive Hedging is a popular decomposition algorithm for solving multi-stage stochastic optimization problems. A computational bottleneck of this algorithm is that all scenario subproblems have to be solved at each iteration. In this paper, we introduce randomized versions of the Progressive Hedging algorithm able to produce new iterates as soon as a single scenario subproblem is solved. Building on the relation between Progressive Hedging and monotone operators, we leverage recent results on randomized fixed point methods to derive and analyze the proposed methods. Finally, we release the corresponding code as an easy-to-use Julia toolbox and report computational experiments showing the practical interest of randomized algorithms, notably in a parallel context. Throughout the paper, we pay a special attention to presentation, stressing main ideas, avoiding extra-technicalities, in order to make the randomized methods accessible to a broad audience in the Operations Research community.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. In our toolbox, we solve these problems with IPOPT (Wächter and Biegler 2006), an open source software package for nonlinear optimization.

  2. The full projection can be performed anyway but the variables that are not associated with \(s^k\) will not be taken into account by the algorithm anyhow.

  3. We assume consistent writing, i.e. reading and writing do not clash with each other, extensions to inconsistent reads is discussed in Peng et al. (2016, Sec. 1.2)

  4. In “Appendix C”, following Peng et al. (2016), we denote by \({\hat{\hbox {x}}}^k={{\,\mathrm{x}\,}}^{k-d_k}\) if worker i started its update at time \(k-d_k\).

  5. A detailed explanation of Julia’s parallelism is available at Julia documentation: https://docs.Julialang.org/en/v1/manual/parallel-computing/. By default, the created workers are on the same machine but can easily be put on a distant machine through an SSH channel.

  6. https://stanford.edu/~lcambier/cgi-bin/fast/index.php.

  7. In particular, the (primal) feasibility tolerance is \(10^{-8}\), and therefore this is the target level of tolerance for the experiments.

  8. In parallel setups, the respective performance of parallel and asynchronous methods is highly variable. We report the experiments obtained on a rather well behaved setup (all workers are equal), still they reflect the general trend we observed.

  9. As \({\mathsf {O}}_{\mu {\mathsf {A}}}\) and \({\mathsf {O}}_{\mu {\mathsf {B}}}\) are non-expansive but not firmly non-expansive, it is necessary to average them with the current iterate (this is often called the Krasnosel’skiĭ–Mann algorithm (Bauschke and Combettes 2011, Chap 5.2)) to make this iteration firmly non-expansive and ensure Fejér monotone convergence.

References

  • Bauschke, H. H., & Combettes, P. L. (2011). Convex analysis and monotone operator theory in Hilbert spaces. Berlin: Springer.

    Book  Google Scholar 

  • Bezanson, J., Edelman, A., Karpinski, S., & Shah, V. B. (2017). Julia: A fresh approach to numerical computing. SIAM Review, 59(1), 65–98. https://doi.org/10.1137/141000671.

    Article  Google Scholar 

  • Bianchi, P., Hachem, W., & Iutzeler, F. (2015). A coordinate descent primal–dual algorithm and application to distributed asynchronous optimization. IEEE Transactions on Automatic Control, 61(10), 2947–2957.

    Article  Google Scholar 

  • Biel, M., & Johansson, M. (2019). Efficient stochastic programming in Julia. arXiv preprint arXiv:1909.10451.

  • Combettes, P. L., & Pesquet, J. C. (2015). Stochastic quasi-Fejér block-coordinate fixed point iterations with random sweeping. SIAM Journal on Optimization, 25(2), 1221–1248.

    Article  Google Scholar 

  • De Silva, A., & Abramson, D. (1993). Computational experience with the parallel progressive hedging algorithm for stochastic linear programs. In Proceedings of 1993 parallel computing and transputers conference, Brisbane (pp. 164–174).

  • Dunning, I., Huchette, J., & Lubin, M. (2017). Jump: A modeling language for mathematical optimization. SIAM Review, 59(2), 295–320. https://doi.org/10.1137/15M1020575.

    Article  Google Scholar 

  • Eckstein, J. (2017). A simplified form of block-iterative operator splitting and an asynchronous algorithm resembling the multi-block alternating direction method of multipliers. Journal of Optimization Theory and Applications, 173(1), 155–182.

    Article  Google Scholar 

  • Eckstein, J., & Bertsekas, D. P. (1992). On the Douglas–Rachford splitting method and the proximal point algorithm for maximal monotone operators. Mathematical Programming, 55(1–3), 293–318.

    Article  Google Scholar 

  • Eckstein, J., Watson, J. P., & Woodruff, D. L. (2018). Asynchronous projective hedging for stochastic programming.

  • Iutzeler, F., Bianchi, P., Ciblat, P., & Hachem, W. (2013). Asynchronous distributed optimization using a randomized alternating direction method of multipliers. In IEEE 52nd annual conference on decision and control (CDC) (pp. 3671–3676). IEEE.

  • Lions, P. L., & Mercier, B. (1979). Splitting algorithms for the sum of two nonlinear operators. SIAM Journal on Numerical Analysis, 16(6), 964–979.

    Article  Google Scholar 

  • Peng, Z., Xu, Y., Yan, M., & Yin, W. (2016). Arock: An algorithmic framework for asynchronous parallel coordinate updates. SIAM Journal on Scientific Computing, 38(5), A2851–A2879.

    Article  Google Scholar 

  • Pereira, M. V., & Pinto, L. M. (1991). Multi-stage stochastic optimization applied to energy planning. Mathematical programming, 52(1–3), 359–375.

    Article  Google Scholar 

  • Rockafellar, R. T. (2018). Solving stochastic programming problems with risk measures by progressive hedging. Set-Valued and Variational Analysis, 26(4), 759–768.

    Article  Google Scholar 

  • Rockafellar, R. T., & Royset, J. O. (2018). Superquantile/CVaR risk measures: Second-order theory. Annals of Operations Research, 262(1), 3–28.

    Article  Google Scholar 

  • Rockafellar, R. T., & Wets, R. J. B. (1991). Scenarios and policy aggregation in optimization under uncertainty. Mathematics of Operations Research, 16(1), 119–147.

    Article  Google Scholar 

  • Ruszczyński, A. (1997). Decomposition methods in stochastic programming. Mathematical Programming, 79(1–3), 333–353.

    Google Scholar 

  • Ruszczyński, A., & Shapiro, A. (2003). Stochastic programming models. In Handbooks in operations research and management science (Vol. 10, pp. 1–64).

  • Ryan, S. M., Wets, R. J. B., Woodruff, D. L., Silva-Monroy, C., & Watson, J. P. (2013). Toward scalable, parallel progressive hedging for stochastic unit commitment. In IEEE power & energy society general meeting (pp. 1–5). IEEE.

  • Shapiro, A., Dentcheva, D., & Ruszczyński, A. (2009). Lectures on stochastic programming: Modeling and theory. Philadelphia: SIAM.

    Book  Google Scholar 

  • Somervell, M. (1998). Progressive hedging in parallel. Ph.D. thesis, Citeseer.

  • Wächter, A., & Biegler, L. T. (2006). On the implementation of an interior-point filter line-search algorithm for large-scale nonlinear programming. Mathematical Programming, 106(1), 25–57.

    Article  Google Scholar 

  • Watson, J. P., & Woodruff, D. L. (2011). Progressive hedging innovations for a class of stochastic mixed-integer resource allocation problems. Computational Management Science, 8(4), 355–370.

    Article  Google Scholar 

Download references

Acknowledgements

The authors wish to thank the associate editor and the two anonymous reviewers for their valuable comments, notably with respect to the placement in the literature, which greatly improved the paper. F.I. and J.M. thank Welington de Oliveira for fruitful discussions at the very beginning of this project. Funding was provided by PGMO (Distributed Optimization on Graphs with Flexible Communications) and Agence Nationale de la Recherche (ANR-19-CE23-0008 – JCJC DOLL).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Franck Iutzeler.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

A fixed-point view of Progressive Hedging

This appendix complements Sect. 3.1: we reformulate the multistage problem (2.4) as finding a fixed point of some operator (see the textbook Ruszczyński and Shapiro (2003, Chap. 3.9)). For all definitions and results on monotone operator theory, we refer to Bauschke and Combettes (2011).

Denoting the objective function by \(f({{\,\mathrm{x}\,}}):=\sum _{s=1}^S p_s f^s({{\,\mathrm{x}\,}})\) and the indicator of constraints by \(\iota _{\mathcal {W}}\) with \(\iota _{\mathcal {W}}({{\,\mathrm{x}\,}}) = 0\) if \({{\,\mathrm{x}\,}}\in \mathcal {W}\) and \(+\infty \) otherwise, we have that solving (2.4) amounts to finding \({{\,\mathrm{x}\,}}^\star \) such that

$$\begin{aligned} 0 \in \partial (f + \iota _{\mathcal {W}})({{\,\mathrm{x}\,}}^\star ) = \partial f({{\,\mathrm{x}\,}}^\star ) + \partial \iota _{\mathcal {W}}({{\,\mathrm{x}\,}}^\star ) \end{aligned}$$

where we use Assumption 2 for the equality. Then, we introduce the two following operators

$$\begin{aligned} {\mathsf {A}}({{\,\mathrm{x}\,}}) := P^{-1} \partial f({{\,\mathrm{x}\,}}) ~~\text { and }~~ {\mathsf {B}}({{\,\mathrm{x}\,}}) := P^{-1} \partial \iota _{\mathcal {W}}({{\,\mathrm{x}\,}}) \end{aligned}$$
(A.1)

where \(P = \mathrm {diag}( p_1,\ldots ,p_S )\). Using Assumption 1, the operators \({\mathsf {A}}\) and \({\mathsf {B}}\) defined in (A.1) are maximal monotone since so are the subdifferentials of convex proper lower-semicontinuous functions.

Solving (2.4) thus amounts to finding a zero of \({\mathsf {A}}+{\mathsf {B}}\) the sum of two maximal monotone operators:

$$\begin{aligned} {{\,\mathrm{x}\,}}^\star \text {solves} \,\, (2.4) \iff {{\,\mathrm{x}\,}}^\star \text {is a zero of} \,\, {\mathsf {A}}+ {\mathsf {B}}\text {i.e.}~ 0\in {\mathsf {A}}({{\,\mathrm{x}\,}}^\star ) + {\mathsf {B}}({{\,\mathrm{x}\,}}^\star ). \end{aligned}$$
(A.2)

We follow the notation of Ruszczyński and Shapiro (2003, Chap. 3) and the properties of Bauschke and Combettes (2011, Chap. 4.1 and 23.1). For a given maximal monotone operator \({\mathsf {M}}\), we define for any \(\mu >0\) two associated operators:

  1. (i)

    the resolvent \({\mathsf {J}}_{\mu {\mathsf {M}}} = (I+\mu {\mathsf {M}})^{-1} \), (which is well-defined and firmly non-expansive),

  2. (ii)

    the reflected resolvent \({\mathsf {O}}_{\mu {\mathsf {M}}} = 2{\mathsf {J}}_{\mu {\mathsf {M}}}-I\) (which is non-expansive).

These operators allow us to formulate our multistage problem as a fixed-point problem: with the help of (A.2) and Bauschke and Combettes (2011, Prop. 25.1(ii)), we have

$$\begin{aligned}&{{\,\mathrm{x}\,}}^\star \text {solves} \,\, (2.4) \iff {{\,\mathrm{x}\,}}^\star \\&\quad = {\mathsf {J}}_{\mu {\mathsf {B}}}({{\,\mathrm{z}\,}}^\star ) \text {with} {{\,\mathrm{z}\,}}^\star \text { a fixed point of} \,\,{\mathsf {O}}_{\mu {\mathsf {A}}}\circ {\mathsf {O}}_{\mu {\mathsf {B}}}, \text { i.e.} {{\,\mathrm{z}\,}}^\star ={\mathsf {O}}_{\mu {\mathsf {A}}}\circ {\mathsf {O}}_{\mu {\mathsf {B}}}({{\,\mathrm{z}\,}}^\star ). \end{aligned}$$

We can apply now a fixed-point algorithm to the firmly non-expansive operatorFootnote 9\(\frac{1}{2}{\mathsf {O}}_{\mu {\mathsf {A}}} \circ {\mathsf {O}}_{\mu {\mathsf {B}}} + \frac{1}{2}{{\,\mathrm{Id}\,}}\) to find a fixed point of \({\mathsf {O}}_{\mu {\mathsf {A}}} \circ {\mathsf {O}}_{\mu {\mathsf {B}}}\).

This gives the following iteration (equivalent to Douglas-Rachford splitting)

$$\begin{aligned} {{\,\mathrm{z}\,}}^{k+1} = \frac{1}{2}{\mathsf {O}}_{\mu {\mathsf {A}}}({\mathsf {O}}_{\mu {\mathsf {B}}}({{\,\mathrm{z}\,}}^k)) + \frac{1}{2}{{\,\mathrm{z}\,}}^k \end{aligned}$$
(A.3)

which converges to a point \({{\,\mathrm{z}\,}}^\star \) such that \(x^\star := {\mathsf {J}}_{\mu {\mathsf {B}}}({{\,\mathrm{z}\,}}^\star ) \) is a zero of \({\mathsf {A}}+{\mathsf {B}}\); see Bauschke and Combettes (2011,Chap. 25.2).

It is well-known (see e.g. the textbook Ruszczyński and Shapiro (2003, Chap. 3, Fig. 10)) that this algorithm with the operators \({\mathsf {A}}\) and \({\mathsf {B}}\) defined in (A.1) leads to the Progressive Hedging algorithm. We give here a short proof of this property; along the way, we introduce basic properties and arguments used in the new developments on randomized Progressive Hedging of the next two appendices. We provide first the expressions of the reflected resolvent operators for \({\mathsf {A}}\) and \({\mathsf {B}}\).

Lemma 1

(Operators associated with Progressive Hedging) Let endow the space \(\mathbb {R}^{S\times n}\) of \(S\times n\) real matrices with the weighted inner product \(\langle A,B\rangle _P = \mathrm {Trace}(A^\mathrm {T} P B)\). Then the operators \({\mathsf {A}}\) and \({\mathsf {B}}\) defined in (A.1) are maximal monotone, and their reflected resolvent operators \({\mathsf {O}}_{\mu {\mathsf {A}}}\) and \({\mathsf {O}}_{\mu {\mathsf {B}}}\) have the following expressions:

  1. i)

    \({\mathsf {O}}_{\mu {\mathsf {A}}}({{\,\mathrm{z}\,}}) = {{\,\mathrm{x}\,}}- \mu {{\,\mathrm{u}\,}}\) with

    $$\begin{aligned} {{\,\mathrm{x}\,}}^s = {{\,\mathrm{argmin}\,}}_{y\in \mathbb {R}^n}\left\{ f^s(y) + \frac{1}{2\mu } \left\| y-{{\,\mathrm{z}\,}}^s \right\| ^2 \right\} \text { for all } s=1,\ldots ,S \end{aligned}$$

    and \({{\,\mathrm{u}\,}}=({{\,\mathrm{z}\,}}-{{\,\mathrm{x}\,}})/\mu \) (hence \({\mathsf {O}}_{\mu {\mathsf {A}}}({{\,\mathrm{z}\,}}) = 2{{\,\mathrm{x}\,}}- {{\,\mathrm{z}\,}}\));

  2. ii)

    \({\mathsf {O}}_{\mu {\mathsf {B}}}({{\,\mathrm{z}\,}}) = {{\,\mathrm{x}\,}}- \mu {{\,\mathrm{u}\,}}\) with

    $$\begin{aligned} {{\,\mathrm{x}\,}}_t^s = \frac{1}{\sum _{\sigma \in \mathcal {B}^s_t} p_\sigma } \sum _{\sigma \in \mathcal {B}^s_t} p_\sigma {{\,\mathrm{z}\,}}_t^\sigma \text { for all } s=1,\ldots ,S \text { and } t=1,\ldots ,T \end{aligned}$$

    and \({{\,\mathrm{u}\,}}=({{\,\mathrm{z}\,}}-{{\,\mathrm{x}\,}})/\mu \) (hence \({\mathsf {O}}_{\mu {\mathsf {B}}}({{\,\mathrm{z}\,}}) = 2{{\,\mathrm{x}\,}}- {{\,\mathrm{z}\,}}\)). The point \({{\,\mathrm{x}\,}}\) is the orthogonal projection of \({{\,\mathrm{z}\,}}\) to \(\mathcal {W}\). Thus, \({{\,\mathrm{z}\,}}\) writes uniquely as \({{\,\mathrm{z}\,}}= {{\,\mathrm{x}\,}}+ \mu {{\,\mathrm{u}\,}}\) with \({{\,\mathrm{x}\,}}\in \mathcal {W}\) and \({{\,\mathrm{u}\,}}\in \mathcal {W}^\perp \).

Proof

Since \(\partial f(\cdot )\) and \(\partial \iota _{\mathcal {W}}(\cdot )\) are the subdifferentials of convex proper lower-semicontinuous functions, they are maximal monotone with respect to the usual inner product, and there so are \({\mathsf {A}}\) and \({\mathsf {B}}\), with respect to the weighted inner product.

Applying Bauschke and Combettes (2011, Prop. 23.1) to a maximal monotone operator \({\mathsf {M}}\), we get that \({{\,\mathrm{z}\,}}\in \mathbb {R}^{S\times n}\) can be uniquely represented as \({{\,\mathrm{z}\,}}= {{\,\mathrm{x}\,}}+ \mu {{\,\mathrm{u}\,}}\) with \({{\,\mathrm{u}\,}}\in {\mathsf {M}}({{\,\mathrm{x}\,}})\), thus \({\mathsf {J}}_{\mu {\mathsf {M}}}({{\,\mathrm{z}\,}}) = {{\,\mathrm{x}\,}}\) and \({\mathsf {O}}_{\mu {\mathsf {M}}}({{\,\mathrm{z}\,}}) = {\mathsf {O}}_{\mu {\mathsf {M}}}({{\,\mathrm{x}\,}}+ \mu {{\,\mathrm{u}\,}}) = {{\,\mathrm{x}\,}}- \mu {{\,\mathrm{u}\,}}\). This gives the expressions for \({\mathsf {O}}_{\mu {\mathsf {A}}}\) and \({\mathsf {O}}_{\mu {\mathsf {B}}}\) from the expressions of \({\mathsf {J}}_{\mu {\mathsf {A}}}\) and \({\mathsf {J}}_{\mu {\mathsf {B}}}\) based on the proximity operators associated with f and \(\iota _{\mathcal {W}}\) (see Bauschke and Combettes 2011, Prop. 16.34).\(\square \)

We now apply the general Douglas–Rachford scheme (A.3) with the expressions obtained in Lemma 1. We first get:

$$\begin{aligned} \left\{ \begin{array}{ll} {{\,\mathrm{x}\,}}_t^{k,s} = \frac{1}{\sum _{\sigma \in \mathcal {B}^s_t} p_\sigma } \sum _{\sigma \in \mathcal {B}^s_t} p_\sigma {{\,\mathrm{z}\,}}_t^{k,\sigma } \text { for all } s=1,\ldots ,S \text { and } t=1,\ldots ,T &{} {\scriptstyle {{\,\mathrm{x}\,}}^k\in \mathcal {W}}\\[2ex] {{\,\mathrm{w}\,}}^k = {\mathsf {O}}_{\mu {\mathsf {B}}}({{\,\mathrm{z}\,}}^k) = 2{{\,\mathrm{x}\,}}^k - {{\,\mathrm{z}\,}}^k = {{\,\mathrm{x}\,}}^k - \mu {{\,\mathrm{u}\,}}^k &{} {\scriptstyle \text { with } {{\,\mathrm{u}\,}}^k = ({{\,\mathrm{z}\,}}^k-{{\,\mathrm{x}\,}}^k)/\mu \in \mathcal {W}^\perp } \\ &{} {\scriptstyle \text { thus } {{\,\mathrm{u}\,}}^k = {{\,\mathrm{u}\,}}^{k-1} + \frac{1}{\mu } ({{\,\mathrm{y}\,}}^k-{{\,\mathrm{x}\,}}^k) } \\ {{\,\mathrm{y}\,}}^{k+1,s} = {{\,\mathrm{argmin}\,}}_{y\in \mathbb {R}^n}\left\{ f^s(y) + \frac{1}{2\mu } \left\| y-{{\,\mathrm{w}\,}}^{k,s} \right\| ^2 \right\} \text { for all } s=1,\ldots ,S &{} \\ {{\,\mathrm{z}\,}}^{k+1} = \frac{1}{2} (2{{\,\mathrm{y}\,}}^{k+1}-{{\,\mathrm{w}\,}}^k) + \frac{1}{2}{{\,\mathrm{z}\,}}^k = {{\,\mathrm{z}\,}}^k + {{\,\mathrm{y}\,}}^{k+1} - {{\,\mathrm{x}\,}}^{k+1} = {{\,\mathrm{y}\,}}^{k+1} + \mu {{\,\mathrm{u}\,}}^k &{} \end{array} \right. \end{aligned}$$

Let us reorganize the equations and eliminate intermediate variables. In particular, we use the fact that, provided that the algorithm is initialized with \({{\,\mathrm{x}\,}}^0 \in \mathcal {W}\) and \({{\,\mathrm{u}\,}}^0\in \mathcal {W}^\perp \), all iterates \(({{\,\mathrm{x}\,}}^k)\) and \(({{\,\mathrm{u}\,}}^k)\) are in \(\mathcal {W}\) and \(\mathcal {W}^\perp \) respectively. We eventually obtain:

$$\begin{aligned} \left\{ \begin{array}{ll} {{\,\mathrm{y}\,}}^{k+1,s} = {{\,\mathrm{argmin}\,}}_{y\in \mathbb {R}^n}\left\{ f^s(y) + \frac{1}{2\mu } \left\| y-{{\,\mathrm{x}\,}}^{k,s} + \mu {{\,\mathrm{u}\,}}^{k,s} \right\| ^2 \right\} \text { for all } s=1,\ldots ,S &{} \\ {{\,\mathrm{x}\,}}_t^{k+1,s} = \frac{1}{\sum _{\sigma \in \mathcal {B}^s_t} p_\sigma } \sum _{\sigma \in \mathcal {B}^s_t} p_\sigma {{\,\mathrm{y}\,}}_t^{k+1,\sigma } \text { for all } s=1,\ldots ,S \text { and } t=1,\ldots ,T &{} \\ {\scriptstyle {{\,\mathrm{x}\,}}^k\in \mathcal {W}\text { converges to a solution of}\,\, }(2.4) &{} \\ {{\,\mathrm{u}\,}}^{k+1} = {{\,\mathrm{u}\,}}^{k} + \frac{1}{\mu } ({{\,\mathrm{y}\,}}^{k+1}-{{\,\mathrm{x}\,}}^{k+1}) &{} \end{array} \right. \end{aligned}$$

This is exactly the Progressive Hedging algorithm, written with similar notation as in the textbook (Ruszczyński and Shapiro 2003, Chap. 3, Fig. 10). The convergence of the algorithm (recalled in Theorem 3.1)) can be obtained directly by instantiating the general convergence result of the Douglas-Rachford method (Bauschke and Combettes 2011, Chap. 25.2).

In the next two appendices, we are going to follow the same line that has brought us from Douglas–Rachford to Progressive Hedging, to go from randomized Douglas–Rachford to randomized Progressive Hedging, and from asynchronous Douglas–Rachford to asynchronous Progressive Hedging.

B Derivation and proof of the Randomized Progressive Hedging

A randomized counterpart of the Douglas–Rachford method (A.3) consists in updating only part of the variable chosen at random; see Iutzeler et al. (2013) and extensions (Bianchi et al. 2015; Combettes and Pesquet 2015). At each iteration, this variant amounts to update the variables corresponding to the chosen scenario \(s^k\) (randomly chosen with probability \(q_s^k\)), the other staying unchanged:

$$\begin{aligned}&\text {Draw a scenario } s^k\in \{1,\ldots ,S\} \text { with probability } \mathbb {P}[s^k= s] = q_s \nonumber \\&\left| \begin{array}{l} {{\,\mathrm{z}\,}}^{k+1,s^k} = \frac{1}{2}\left[ {\mathsf {O}}_{\mu {\mathsf {A}}}({\mathsf {O}}_{\mu {\mathsf {B}}}({{\,\mathrm{z}\,}}^k))\right] ^{s^k} + \frac{1}{2}{{\,\mathrm{z}\,}}^{k,s^k} \\ {{\,\mathrm{z}\,}}^{k+1,s} = {{\,\mathrm{z}\,}}^{k,s} \text { for all } s\ne s^k\end{array} \right. \end{aligned}$$
(B.1)

Our goal is to obtain the Randomized Progressive Hedging (Algorithm 2) as an instantiation of (B.1) with the operators defined in Lemma 1 in “Appendix A”. Before proceeding with the derivation, let us prove the convergence of (B.1) with these operators.

Proposition 1

Consider a multistage problem (2.4) verifying Assumptions 1 and 2. Then, the sequence \(({{\,\mathrm{z}\,}}^k)\) generated by (B.1) with \({\mathsf {O}}_{\mu {\mathsf {A}}}\) and \({\mathsf {O}}_{\mu {\mathsf {B}}}\) defined in Lemma 1 converges almost surely to a fixed point of \({\mathsf {O}}_{\mu {\mathsf {A}}}\circ {\mathsf {O}}_{\mu {\mathsf {B}}}\). Furthermore, \(\tilde{{{\,\mathrm{x}\,}}}^k := {\mathsf {J}}_{\mu {\mathsf {B}}}({{\,\mathrm{z}\,}}^k)\) converges to a solution of (2.4).

Proof

First, recall from Lemma 1 that under assumptions 1 and 2, the operators \({\mathsf {A}},{\mathsf {B}}\) of (A.1) are maximal monotone. Then, the associated operators \({\mathsf {O}}_{\mu {\mathsf {A}}}\) and \({\mathsf {O}}_{\mu {\mathsf {A}}}\) are then non-expansive by construction (see Bauschke and Combettes 2011, Chap. 4.1), and therefore the iteration \({\mathsf {T}} = ({\mathsf {O}}_{\mu {\mathsf {A}}}\circ {\mathsf {O}}_{\mu {\mathsf {B}}} + I)/2 \) is firmly non expansive. This is the key assumption to use the convergence result (Iutzeler et al. 2013, Th. 2) which gives that the sequence \(({{\,\mathrm{z}\,}}^k)\) generated by (B.1) converges almost surely to a fixed point of \({\mathsf {O}}_{\mu {\mathsf {A}}}\circ {\mathsf {O}}_{\mu {\mathsf {B}}}\). Using the continuity of \( {\mathsf {J}}_{\mu {\mathsf {B}}}\) and the fact that \(x^\star := {\mathsf {J}}_{\mu {\mathsf {B}}}({{\,\mathrm{z}\,}}^\star ) \) is a zero of \({\mathsf {A}}+{\mathsf {B}}\) (i.e. solves the multi-stage problem (2.4) by (A.2)) gives the last part of the result. \(\square \)

Now that the convergence of (B.1) with the operators of “Appendix A” has been proven, let us derive our Randomized Progressive Hedging (Algorithm 2) as an equivalent formulation of (B.1). By doing so, the associated convergence result (Theorem 3.2) directly follows from Proposition 1.

From the specific expressions of operators \({\mathsf {O}}_{\mu {\mathsf {A}}}\) and \({\mathsf {O}}_{\mu {\mathsf {B}}}\) (Lemma 1), we see that these operators are very different in nature:

  • \({\mathsf {O}}_{\mu {\mathsf {A}}}\) is separable by scenario but involves solving a subproblem;

  • \({\mathsf {O}}_{\mu {\mathsf {B}}}\) links the scenarios but only amounts to computing a weighted average.

To leverage this structure, we apply the randomized Douglas–Rachford method (B.1) and get:

$$\begin{aligned} \left\{ \begin{array}{ll} \text {Draw a scenario } s^k\in \{1,\ldots ,S\} \text { with probability } \mathbb {P}[s^k= s] = q_s &{} \\ {{\,\mathrm{x}\,}}_t^{k,s} \!=\! \frac{1}{\sum _{\sigma \in \mathcal {B}^s_t} p_\sigma } \sum _{\sigma \in \mathcal {B}^s_t} p_\sigma {{\,\mathrm{z}\,}}_t^{k,\sigma } \text { for all } s\!=\!1,\ldots ,S \text { and } t\!=\!1,\ldots ,T {\scriptstyle {{\,\mathrm{x}\,}}^k\in \mathcal {W}} &{} \\ {{\,\mathrm{w}\,}}^k \!=\! {\mathsf {O}}_{\mu {\mathsf {B}}}({{\,\mathrm{z}\,}}^k) \!=\! 2{{\,\mathrm{x}\,}}^k \!-\! {{\,\mathrm{z}\,}}^k = {{\,\mathrm{x}\,}}^k - \mu {{\,\mathrm{u}\,}}^k {\scriptstyle \text { with } {{\,\mathrm{u}\,}}^k = ({{\,\mathrm{z}\,}}^k-{{\,\mathrm{x}\,}}^k)/\mu \in \mathcal {W}^\perp } &{} \\ {{\,\mathrm{y}\,}}^{k+1,s} = {{\,\mathrm{argmin}\,}}_{y\in \mathbb {R}^n}\left\{ f^s(y) + \frac{1}{2\mu } \left\| y-{{\,\mathrm{w}\,}}^{k,s} \right\| ^2 \right\} \text { for all } s=1,\ldots ,S &{} \\ \left| \begin{array}{l} {{\,\mathrm{z}\,}}^{k+1,s^k}\! =\frac{1}{2} (2{{\,\mathrm{y}\,}}^{k+1,s^k}- {{\,\mathrm{w}\,}}^{k,s^k}) + \frac{1}{2}{{\,\mathrm{z}\,}}^{k,s^k} \!=\! {{\,\mathrm{z}\,}}^{k,s^k} + {{\,\mathrm{y}\,}}^{k+1,s^k} - {{\,\mathrm{x}\,}}^{k+1,s^k} = {{\,\mathrm{y}\,}}^{k+1,s^k} + \mu {{\,\mathrm{u}\,}}^{k,s^k} \\ {{\,\mathrm{z}\,}}^{k+1,s} = {{\,\mathrm{z}\,}}^{k,s} \text { for all } s\ne s^k\end{array} \right.&\end{array} \right. \end{aligned}$$

Let us carefully prune unnecessary computations. First, only \({{\,\mathrm{y}\,}}^{k+1,s^k}\) needs to be computed, so the other \({{\,\mathrm{y}\,}}^{k+1,s}\) (\(s\ne s^k\)) can be safely dropped. The same holds for \({{\,\mathrm{x}\,}}^{k,s^k}\), \({{\,\mathrm{w}\,}}^{k,s^k}\), and \({{\,\mathrm{u}\,}}^{k,s^k}\). However, even though only \({{\,\mathrm{x}\,}}^{k,s^k}\) need to be computed, it depends on all the other scenarios through the projection operator, so the iterates have to be computed successively and with only a partial update of \({{\,\mathrm{u}\,}}^k\) (in contrast with “Appendix A”, \({{\,\mathrm{u}\,}}^k\) does not belong to \(\mathcal {W}\) anymore and thus cannot be dropped out of the projection, thus we keep directly the global variable \({{\,\mathrm{z}\,}}^k\) updated):

$$\begin{aligned} \left\{ \begin{array}{ll} \text {Draw a scenario } s^k\in \{1,\ldots ,S\} \text { with probability } \mathbb {P}[s^k= s] = q_s &{} \\ {{\,\mathrm{x}\,}}_t^{k,s^k} = \frac{1}{\sum _{\sigma \in \mathcal {B}^{s^k}_t} p_\sigma } \sum _{\sigma \in \mathcal {B}^{s^k}_t} p_\sigma {{\,\mathrm{z}\,}}_t^{k,\sigma } \text { for all } t=1,\ldots ,T &{} \\ {{\,\mathrm{w}\,}}^{k,s^k} = 2{{\,\mathrm{x}\,}}^{k,s^k} - {{\,\mathrm{z}\,}}^{k,s^k} &{} \\ {{\,\mathrm{y}\,}}^{k+1,s^k} = {{\,\mathrm{argmin}\,}}_{y\in \mathbb {R}^n}\left\{ f^{s^k}(y) + \frac{1}{2\mu } \left\| y-{{\,\mathrm{w}\,}}^{k,s^k} \right\| ^2 \right\} &{} \\ \left| \begin{array}{l} {{\,\mathrm{z}\,}}^{k+1,s^k} = {{\,\mathrm{z}\,}}^{k,s^k} + {{\,\mathrm{y}\,}}^{k+1,s^k} - {{\,\mathrm{x}\,}}^{k+1,s^k} \\ {{\,\mathrm{z}\,}}^{k+1,s} = {{\,\mathrm{z}\,}}^{k,s} \text { for all } s\ne s^k\end{array} \right.&\end{array} \right. \end{aligned}$$

Eliminating intermediate variable \( {{\,\mathrm{w}\,}}\), we obtain the randomized Progressive Hedging:

$$\begin{aligned} \left\{ \begin{array}{ll} \text {Draw a scenario } s^k\in \{1,\ldots ,S\} \text { with probability } \mathbb {P}[s^k= s] = q_s &{} \\ {{\,\mathrm{x}\,}}_t^{k+1,s^k} = \frac{1}{\sum _{\sigma \in \mathcal {B}^{s^k}_t} p_\sigma } \sum _{\sigma \in \mathcal {B}^{s^k}_t} p_\sigma {{\,\mathrm{z}\,}}_t^{k,\sigma } \text { for all } t=1,\ldots ,T &{} \\ {{\,\mathrm{y}\,}}^{k+1,s^k} = {{\,\mathrm{argmin}\,}}_{y\in \mathbb {R}^n}\left\{ f^{s^k}(y) + \frac{1}{2\mu } \left\| y- 2{{\,\mathrm{x}\,}}^{k+1,s^k} + {{\,\mathrm{z}\,}}^{k,s^k} \right\| ^2 \right\} &{} \\ \left| \begin{array}{l} {{\,\mathrm{z}\,}}^{k+1,s^k} = {{\,\mathrm{z}\,}}^{k,s^k} + {{\,\mathrm{y}\,}}^{k+1,s^k} - {{\,\mathrm{x}\,}}^{k+1,s^k} \\ {{\,\mathrm{z}\,}}^{k+1,s} = {{\,\mathrm{z}\,}}^{k,s} \text { for all } s\ne s^k\end{array} \right.&\end{array} \right. \end{aligned}$$

Finally, notice that from Proposition 1, that the variable converging to a solution of (2.4) is \(\tilde{{{\,\mathrm{x}\,}}}^k := {\mathsf {J}}_{\mu {\mathsf {B}}}({{\,\mathrm{z}\,}}^k)\). From Lemma 1 (and the fact that \({\mathsf {O}}_{\mu {\mathsf {B}}} = 2{\mathsf {J}}_{\mu {\mathsf {B}}} - {\mathsf {I}}\)), we get that \(\tilde{{{\,\mathrm{x}\,}}}_t^{k,s} = \frac{1}{\sum _{\sigma \in \mathcal {B}^s_t} p_\sigma } \sum _{\sigma \in \mathcal {B}^s_t} p_\sigma {{\,\mathrm{z}\,}}_t^{k,\sigma } \text { for all } s=1,\ldots ,S \text { and } t=1,\ldots ,T\) and that \(\tilde{{{\,\mathrm{x}\,}}}^k \in \mathcal {W}\).

C Derivation and proof of the asynchronous Randomized Progressive Hedging

Using again the bridge between Progressive Hedging and fixed-point algorithms, we present here how to derive an asynchronuous progressive hedeging from the asynchronous parallel fixed-point algorithm ARock  (Peng et al. 2016). In order to match the notation and derivations of Peng et al. (2016), let us define the operator \({\mathsf {S}} := I - {\mathsf {O}}_{\mu {\mathsf {A}}} \circ {\mathsf {O}}_{\mu {\mathsf {B}}}\), the zeros of which coincide with the fixed points of \({\mathsf {O}}_{\mu {\mathsf {A}}} \circ {\mathsf {O}}_{\mu {\mathsf {B}}}\). Applying ARock to this operator leads to the following iteration:

$$\begin{aligned}&\text {Every worker asynchronously do} \nonumber \\&\left\{ \begin{array}{l} \text {Draw a scenario } s^k\in \{1,\ldots ,S\} \text { with probability } \mathbb {P}[s^k= s] = q_s \\ \left| \begin{array}{l} {{\,\mathrm{z}\,}}^{k+1,s^k} = {{\,\mathrm{z}\,}}^{k,s^k} - \frac{\eta ^k}{S p_{s^k}} \left( \hat{\hbox {z}}^{k,s^k} - \left[ {\mathsf {O}}_{\mu {\mathsf {A}}}({\mathsf {O}}_{\mu {\mathsf {B}}}(\hat{\hbox {z}}^k))\right] ^{s^k} \right) \\ {{\,\mathrm{z}\,}}^{k+1,s} = {{\,\mathrm{z}\,}}^{k,s} \text { for all } s\ne s^k\end{array} \right. \\ {\scriptstyle \text {where } \hat{\hbox {z}}^k \text { is the value of } {{\,\mathrm{z}\,}}^{k} \text { used by the updating worker at time { k} for its computation}} \end{array} \right. \end{aligned}$$
(C.1)

Notice that the main difference between this iteration and (B.1) is the introduction of the variable \(\hat{\hbox {z}}^k\) which is used to handle delays between workers in asynchronous computations:

  • If there is only one worker, it just computes its new point with the latest value so we simply have: \(\hat{\hbox {z}}^k = {{\,\mathrm{z}\,}}^k\). We notice that taking \(\eta ^k = S p_{s^k} /2 \), we recover exactly the randomized Douglas–Rachford method (B.1);

  • If there are several workers, \(\hat{\hbox {z}}^k \) is usually an older version of the main variable, as other workers may have updated the main variable during the computation of the updating worker. In this case, we have \(\hat{\hbox {z}}^k = {{\,\mathrm{z}\,}}^{k-d^k}\) where \(d^k\) is the delay suffered by the updating worker at time k.

We derive here our Asynchronous Randomized Progressive hedging (Algorithm 4) as an instantiation of (C.1) with the operators \({\mathsf {O}}_{\mu {\mathsf {A}}}\) and \({\mathsf {O}}_{\mu {\mathsf {B}}}\) defined in “Appendix A”. Let us establish first the convergence of this scheme using a general result of  Peng et al. (2016) which makes little assumptions on the communications between workers and master. The main requirement is that the maximum delay between workers is bounded, which is a reasonable assumption when the algorithm is run on a multi-core machine or on a medium-size computing cluster.

Proposition 2

Consider a multistage problem (2.4) verifying Assumptions 1 and 2. We assume furthermore that the delays are bounded: \(d^k\le \tau <\infty \) for all k. If we take the stepsize \(\eta ^k\) as follows for some fixed \(0<c<1\)

$$\begin{aligned} 0< \eta _{\min } \le \eta ^k ~\le ~ \frac{c S q_{\min }}{2\tau \sqrt{q_{\min }} +1 } \qquad \text {with } q_{\min } = \min _s q_s. \end{aligned}$$
(C.2)

Then, the sequence \(({{\,\mathrm{z}\,}}^k)\) generated by (C.1) with \({\mathsf {O}}_{\mu {\mathsf {A}}}\) and \({\mathsf {O}}_{\mu {\mathsf {B}}}\) defined in Lemma 1 converges almost surely to a fixed point of \({\mathsf {O}}_{\mu {\mathsf {A}}}\circ {\mathsf {O}}_{\mu {\mathsf {B}}}\). Furthermore, \(\tilde{{{\,\mathrm{x}\,}}}^k := {\mathsf {J}}_{\mu {\mathsf {B}}}({{\,\mathrm{z}\,}}^k)\) converges to a solution of (2.4).

Proof

The beginning of the proof follows the same lines as the one of Proposition 1 to show that \({\mathsf {O}}_{\mu {\mathsf {A}}}\) and \({\mathsf {O}}_{\mu {\mathsf {A}}}\) are non-expansive by construction, which implies that \({\mathsf {S}} := I - {\mathsf {O}}_{\mu {\mathsf {A}}} \circ {\mathsf {O}}_{\mu {\mathsf {B}}}\) is also non-expansive with its zeros corresponding to the fixed points of \({\mathsf {O}}_{\mu {\mathsf {A}}} \circ {\mathsf {O}}_{\mu {\mathsf {B}}}\) (see Bauschke and Combettes 2011, Chap. 4.1). We can then apply (Peng et al. 2016, Th. 3.7) to get that \(({{\,\mathrm{z}\,}}^k)\) converges almost surely to a zero of \({\mathsf {S}}\). As in the proof of Proposition 1, we use the continuity of \( {\mathsf {J}}_{\mu {\mathsf {B}}}\) and the fact that \(x^\star := {\mathsf {J}}_{\mu {\mathsf {B}}}({{\,\mathrm{z}\,}}^\star ) \) is a zero of \({\mathsf {A}}+{\mathsf {B}}\) (i.e. solves the multi-stage problem (2.4) by (A.2)) to get the last part of the result. \(\square \)

Using the expressions of the operators of Lemma 1, (C.1) writes

$$\begin{aligned}&\text {Every worker asynchronously do} \\&\left\{ \begin{array}{ll} \text {Draw a scenario } s^k\in \{1,\ldots ,S\} \text { with probability } {\mathbb {P}}[s^k= s] = q_s &{} \\ {\hat{\hbox {x}}}_t^{k,s} = \frac{1}{\sum _{\sigma \in {\mathcal {B}}^s_t} p_\sigma } \sum _{\sigma \in {\mathcal {B}}^s_t} p_\sigma \hat{\hbox {z}}_t^{k,\sigma } \text { for all } s=1,\ldots ,S \text { and } t=1,\ldots ,T &{} \\ \hat{\hbox {w}}^k = {\mathsf {O}}_{\mu {\mathsf {B}}}(\hat{\hbox {z}}^k) = 2{\hat{\hbox {x}}}^k - \hat{\hbox {z}}^k &{} \\ \hat{\hbox {y}}^{k+1,s} = {{\,\mathrm{argmin}\,}}_{y\in \mathbb {R}^n}\left\{ f^s(y) + \frac{1}{2\mu } \left\| y-\hat{\hbox {w}}^{k,s} \right\| ^2 \right\} \text { for all } s=1,\ldots ,S &{} \\ \left[ {\mathsf {O}}_{\mu {\mathsf {A}}}({\mathsf {O}}_{\mu {\mathsf {B}}}(\hat{\hbox {z}}^k))\right] ^{s^k} = 2 \hat{\hbox {y}}^{k+1,s^k} - \hat{\hbox {w}}^{k,s^k} &{} \\ \left| \begin{array}{l} {{\,\mathrm{z}\,}}^{k+1,s^k} = {{\,\mathrm{z}\,}}^{k,s^k} - \frac{\eta ^k}{S p_{s^k}} \left( \hat{\hbox {z}}^{k,s^k} - \left[ {\mathsf {O}}_{\mu {\mathsf {A}}}({\mathsf {O}}_{\mu {\mathsf {B}}}(\hat{\hbox {z}}^k))\right] ^{s^k} \right) \\ {{\,\mathrm{z}\,}}^{k+1,s} = {{\,\mathrm{z}\,}}^{k,s} \text { for all } s\ne s^k\end{array} \right.&\end{array} \right. \end{aligned}$$

Pruning unnecessary computations, the asynchronous version of Progressing Hedging boils down to:

$$\begin{aligned}&\text {Every worker asynchronously do} \\&\left\{ \begin{array}{ll} \text {Draw a scenario } s^k\in \{1,\ldots ,S\} \text { with probability } \mathbb {P}[s^k= s] = q_s &{} \\ {\hat{\hbox {x}}}_t^{k,s^k} = \frac{1}{\sum _{\sigma \in \mathcal {B}^{s^k}} p_\sigma } \sum _{\sigma \in \mathcal {B}^{s^k}_t} p_\sigma \hat{\hbox {z}}_t^{k,\sigma } \text { for all } t=1,\ldots ,T &{} \\ \hat{\hbox {y}}^{k+1,s^k} = {{\,\mathrm{argmin}\,}}_{y\in \mathbb {R}^n}\left\{ f^{s^k}(y) + \frac{1}{2\mu } \left\| y- 2{\hat{\hbox {x}}}^{k,s^k} + \hat{\hbox {z}}^{k,s^k} \right\| ^2 \right\} &{} \\ \left| \begin{array}{l} {{\,\mathrm{z}\,}}^{k+1,s^k} = {{\,\mathrm{z}\,}}^{k,s^k} + \frac{2 \eta ^k}{S p_{s^k}} \left( \hat{\hbox {y}}^{k+1,s^k} - {\hat{\hbox {x}}}^{k,s^k} \right) \\ {{\,\mathrm{z}\,}}^{k+1,s} = {{\,\mathrm{z}\,}}^{k,s} \text { for all } s\ne s^k\end{array} \right.&\end{array} \right. \end{aligned}$$

This asynchronous algorithm can be readily rewritten as Algorithm 4, highlighting the master–worker implementation. Theorem 3.2 then follows directly from Proposition 2.

D RPH toolbox: implementations details

A basic presentation of the toolbox RPH  is provided in Sect. 5; a complete description is available on the online documentation. In this section, we briefly provide complementary information on the input/output formats.

The input format is a Julia structure, named problem, that gathers all the information to solve a given multi-stage problem.

figure g

The attribute scenarios is an array representing the possible scenarios of the problem. nscenarios is the total number of scenarios brought by the user and the probability affected to each scenario is indicated by the attribute probas. The number of stages, assumed to be equal among all scenarios, is stored in the attribute nstages. The dimension of the variable associated to each stage is stored in the vector of couples stage_to_dim: if for a fixed stage i, \(\texttt {stage\_to\_dim[i]} = \texttt {p:q}\), then the variable associated to stage i is of dimension \(q - p + 1\). Each of the scenarios must inherit the abstract structure AbstractScenario. This abstract structure does not impose any requirements on the scenarios themselves, so that the user is free to plug any relevant information in these scenarios. Here is an example.

figure h

The function build_subpb, provided by the user, informs the solver about the objective function \(f_s\) to use for each scenario. This function is assumed to take as inputs a Jump.model object, a single scenario object and an object scenarioId, which corresponds to an integer that identifies the scenario. build_subpb must then return the variable designed for the optimization, named y below, an expression of the objective function \(f_s\), denoted below objexpr as well as the constraints relative to this scenario, denoted below ctrref.

figure i

Finally, the attribute scenariotree is aimed at storing the graph structure of the scenarios. scenariotree must be of type ScenarioTree, a tree structure designed by the authors. One can build an object scenario tree, by directly stating the shape of the tree with the help of Julia set structure.

figure j

When the tree to generate is known to be complete, one can fastly generate a scenario tree with the help of the constructor by giving the depth of the tree and the degree of the nodes (assumed to be the same for each node in this case):

figure k

The output of the algorithm is the final iterate obtained together with information on the run of the algorithm. Logs that appear on the console are the input parameters and the functional values obtained along with iterations. If the user wishes to track more information, a callback function can be instantiated and given as an input. This additional information can then either be logged on the console or stored in a dictionary hist.

figure l

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bareilles, G., Laguel, Y., Grishchenko, D. et al. Randomized Progressive Hedging methods for multi-stage stochastic programming. Ann Oper Res 295, 535–560 (2020). https://doi.org/10.1007/s10479-020-03811-5

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10479-020-03811-5

Keywords

Navigation