A study of progressive hedging for stochastic integer programming

Christiansen, Jeffrey; Dandurand, Brian; Eberhard, Andrew; Oliveira, Fabricio

doi:10.1007/s10589-023-00532-w

A study of progressive hedging for stochastic integer programming

Open access
Published: 11 October 2023

Volume 86, pages 989–1034, (2023)
Cite this article

Download PDF

You have full access to this open access article

Computational Optimization and Applications Aims and scope Submit manuscript

A study of progressive hedging for stochastic integer programming

Download PDF

Jeffrey Christiansen¹^na1,
Brian Dandurand¹^na1,
Andrew Eberhard ORCID: orcid.org/0000-0003-2977-3456¹^na1 &
…
Fabricio Oliveira²^na1

782 Accesses
1 Citation
Explore all metrics

Abstract

Motivated by recent literature demonstrating the surprising effectiveness of the heuristic application of progressive hedging (PH) to stochastic mixed-integer programming (SMIP) problems, we provide theoretical support for the inclusion of integer variables, bridging the gap between theory and practice. We provide greater insight into the following observed phenomena of PH as applied to SMIP where optimal or at least feasible convergence is observed. We provide an analysis of a modified PH algorithm from a different viewpoint, drawing on the interleaving of (split) proximal-point methods (including PH), Gauss–Seidel methods, and the utilisation of variational analysis tools. Through this analysis, we show that under mild conditions, convergence to a feasible solution should be expected. In terms of convergence analysis, we provide two main contributions. First, we contribute insight into the convergence of proximal-point-like methods in the presence of integer variables via the introduction of the notion of persistent local minima. Secondly, we contribute an enhanced Gauss–Seidel convergence analysis that accommodates the variation of the objective function under mild assumptions. We provide a practical implementation of a modified PH and demonstrate its convergent behaviour with computational experiments in line with the provided analysis.

Tseng’s extragradient method with double projection for solving pseudomonotone variational inequality problems in Hilbert spaces

Article 10 April 2024

The Frank-Wolfe Algorithm: A Short Introduction

Article Open access 13 December 2023

Relaxed Inertial Method for Solving Split Monotone Variational Inclusion Problem with Multiple Output Sets Without Co-coerciveness and Lipschitz Continuity

Article 15 April 2024

1 Introduction

Stochastic mixed integer programming (SMIP) models are, in essence, large-scale mixed-integer programming (MIP) models in which the uncertain nature of the input parameters is modelled by means of a finite set of discrete scenarios [1]. This general framework allows one to model a broad class of decision problems, as can be attested from the wealth of publications from diverse areas of science and engineering. Important applications employing SMIP models include unit commitment [2], hydro-thermal generation scheduling [3], military operations [4], vaccination planning [5], air traffic flow management [6], forestry management and forest fire response [7], supply chain and logistics planning [8], and other applications referred to on the SIPLIB website [9]. The practical and theoretical development of stochastic programs (SP) (without integer variables) preceded SMIP and has influenced its development. The Progressive Hedging (PH) algorithm [10] for solving SP problems is well studied and theoretically supported for convex problems with no integer-constrained variables. Even without this theoretical support in the setting with integer-constrained variables, PH as a heuristic often demonstrates effectiveness for providing both upper and lower bounds [11] and often feasible solutions. Motivated by the limited theoretical support of PH for its application to SMIP and the observed success of PH heuristics for SMIP, our objective is to develop a theoretical framework and demonstrate convergence in numerical experiments.

The large scale of the deterministic equivalent of SMIP models proves to be challenging for off-the-shelf solvers that do not utilise the decomposable structure inherent in the extensive deterministic forms of SMIP models. By contrast, more promising solution methods utilise the SMIP’s decomposable structure. The PH algorithm [10] addresses the decomposable structure as a variant of the alternating direction method of multipliers (ADMM) [12] where the non-anticipativity constraint is relaxed into an augmented Lagrangian (AL) reformulation. One of the earliest detailed treatments of its convergence was based on variational analysis techniques [10], where nonsmooth analysis also provided important tools for the study of convergence with respect to the satisfaction of optimality conditions. Augmented Lagrangian duality, which plays a fundamental role in this work, also appeared in subsequent works to provide duality theorems for very general nonconvex problems, including cases encompassing integer constraints [13, Chapter 11, section K], [14]. These publications have attracted the attention of the integer programming community and resulted in a body of literature focussing specifically on the application of augmented Lagrangian duality to mixed-integer programming (MIP) [15, 16]. This in turn motivated researchers to state, analyse, and test a version of the PH method containing nonsmooth augmentations [17]. Concurrently, researchers have also explored the combination of PH with the Frank-Wolfe algorithm [18] to obtain provably convergent dual bounding methods for SMIP based on the Lagrangian relaxation of the non-anticipativity constraint [19]. Other researchers produced primal (heuristic) methods where the quadratic sub-problems of PH were replaced by mixed integer quadratic programs (MIQP) [11]. These approaches were shown to produce excellent solutions as long as the penalty parameter was chosen judiciously, and so have remained an enigma, lacking any theoretical convergence result. In this paper, we show that variational analysis techniques will further draw back the curtain on this enigma and so explain what actually underpins the success of PH with a MIQP when applied to SMIP.

Under reasonable assumptions, we analyse the convergence of PH applied to SMIPs, where we allow the penalty parameters to vary in a less restricted fashion than is typically required for PH and related approaches, while also applying Lagrangian multiplier updates requiring a special rule due to the required satisfaction of convergence/boundedness criteria that may not be automatically satisfied when PH is applied to SMIPs. Furthermore, our approach allows for more generality in the type of augmented Lagrangian terms. In our analysis, we view the PH method as an application of Gauss–Seidel iterations with penalty and Lagrange terms allowed to vary between iterations. In this setting, we contribute insight into when PH generates a sequence of solutions that converges to a feasible point of the SMIP. Our approach may also be viewed as interfacing some seemingly distinct solution methodologies found in proximal point methods such as PH, Gauss–Seidel (GS) methods, and (mixed-integer) augmented Lagrangian duality [15,16,17, 20]. Furthermore, connection with feasibility pump (FP) primal heuristics is evident in the same spirit as contributed in [21].

Some of the conditions assumed for the penalty and/or augmented Lagrangian term that are required to achieve an exact penalty effect in [15, 16] (e.g., [16, Theorem 5]) require the penalty functions to be non-differentiable, which can impede the analysis of Gauss–Seidel methods [22]. Thus, in this paper, we set out to develop this theory from another direction that allows for a differentiable penalty term, in line with that typically used for analysing progressive hedging-like methods. To compensate for the loss of the exact penalty effect shown in [15, 16], we provide an analysis describing the effect of (potentially) letting the penalty coefficient go to infinity in order to achieve feasibility. In particular, we analyse a SMIP solution method inspired by the FP, PH, and Gauss–Seidel convergence analysis, that for short, will be denoted FPPH, which in practice is similar to the use of PH as an heuristic [11], except we allow for greater generality in the updating of Lagrange multipliers, the changing of penalty coefficients, and in the allowable forms of the augmented Lagrangian penalty function itself. Successful convergence of the method allows for (but is not predicated on) the unbounded increase of penalty parameters. To be clear, our analysis does not promise both primal and dual optimal convergence as is provided for PH in the convex, continuous setting. Rather, we address convergence goals similar to those of feasibility pump methods, where high-quality feasible solutions are sought, and the main challenge is avoiding either non-feasible convergence or cycling.

Our experimental results will demonstrate the effectiveness of FPPH. As with all FP approaches, one needs to develop heuristics for updating the penalty parameters to encourage the methods to locate the best possible feasible solution and hence the strongest primal bound. As a general conclusion, the FPPH presents promising performance relative to Progressive Hedging in terms of quickly obtaining good feasible solutions for SMIPs with pure integer first-stage variables.

This paper is structured as follows. In Sect. 2, we set up the assumptions on the regularisation and the conceptual framework on which the analysis rests. In Sect. 3, further results are developed on how we may decompose the regularisation into its “cross-sections" where integer variables are fixed, which provides a foundation for insight into the local minima of the (whole) regularisation. Section 4 introduces the concept of persistent local minima and their relationship to feasibility for SMIP (1). The convergence analysis of the associated Gauss–Seidel algorithm is carried out in Sect. 5. In Sect. 6, we present computational results illustrating the employment of variants of FPPH to find high-quality feasible solutions to SMIP instances. In Sect. 7, we provide concluding remarks and directions for future developments.

2 Fundamental concepts and conceptual algorithmic framework

Denote $\varvec{x}= (x_s)_{s \in S}$ where $x_s \in \mathbb {X}_d:= \mathbb {R}^{n-q} \times \mathbb {Z}^q \subseteq \mathbb {X}:= \mathbb {R}^n$. Similarly $\varvec{y}= (y_s)_{s \in S}$ where $y_s \in \mathbb {Y}_d:= \mathbb {R}^{m-r} \times \mathbb {Z}^r \subseteq \mathbb {Y}:= \mathbb {R}^m$. We state the SMIP in the following split-variable deterministic formulation (see, e.g., [1])

$$\begin{aligned} \zeta ^{SMIP} =&\min _{\varvec{x}\in \mathbb {X}^{\small {\vert S\vert } }_d ,\varvec{y}\in \mathbb {Y}^{\small {\vert S \vert }}_d,z\in \mathbb {X},\varvec{w}\in \mathbb {Y}^{\small {\vert S \vert }}} \sum _{s\in S} f_s(x_s,y_s) \end{aligned}$$

(1a)

$$\begin{aligned}&\text { s.t. } \quad (z-x_s, w_s-y_s) = (0,0), \qquad (x_s , y_s) \in K_s,\; s \in S, \end{aligned}$$

(1b)

where

$$\begin{aligned} f_s\left( x_s,y_s\right)&:= p_s \left( c^{\top }x_{s}+d_{s}^{\top }y_{s}\right) ,\qquad s \in S \end{aligned}$$

(1c)

$$\begin{aligned} K_s&:= \left\{ (x,y) \in \mathbb {X}_d \times \mathbb {Y}_d \mid x\in X,\; y\in Y_s(x) \right\} ,\qquad s \in S \end{aligned}$$

(1d)

$$\begin{aligned} X&:= \left\{ x\in \mathbb {X}_d \mid Ax\le b \right\} \end{aligned}$$

(1e)

$$\begin{aligned} Y_s(x)&:= \left\{ y\in \mathbb {Y}_d \mid T_s x+ W_s y\le h_s \right\} ,\qquad s \in S. \end{aligned}$$

(1f)

Note that the constraints $x_s \in X$ that hold only for the first-stage decision variables $x_s$ are identical for all $s \in S$.

We denote the extended real values by $\mathbb {R}_{+\infty }:= \mathbb {R}\cup \{+\infty \}$. For each scenario $s \in S$ copy of first-stage variables $x_s$ and separately for each scenario $s \in S$ second-stage variables $y_s$ we assume that the integer variable component indices (${\mathcal {I}}$) always follow the real variable component indices (${\mathcal {R}}$). That is, $x_s:=(x_{s,{\mathcal {R}}},x_{s,{\mathcal {I}}})$ and $y_s:=(y_{s,{\mathcal {R}}},y_{s,{\mathcal {I}}})$ for each $s \in S$. Define the projection ${{\,\textrm{proj}\,}}_{\mathbb {X},{\mathcal {I}}}: \mathbb {X}_d \rightarrow \mathbb {Z}^q$ by ${\text {proj}}_{\mathbb {X},\mathcal {I}} \left( (x_{{\mathcal {R}}},x_{{\mathcal {I}}}) \right) = x_{{\mathcal {I}}}$ (with a similar definition for $y_{{\mathcal {I}}}$ projection ${{\,\textrm{proj}\,}}_{\mathbb {Y},{\mathcal {I}}}$). As the first-stage consensus variable $z$ components should match those for each $x_s$, due to the non-anticipativity constraints $z-x_s=0$, $s \in S$, the same first-stage distinction between real and integer components $z:=(z_{{\mathcal {R}}},z_{{\mathcal {I}}})$ apply. Corresponding distinctions for the second-stage consensus $w_s:=(w_{s,{\mathcal {R}}},w_{s,{\mathcal {I}}})$, $s \in S$, apply as well. Note that $z$ is not explicitly constrained to lie within the discrete feasible set X. Nor are $w_s$, $s \in S$, explicitly constrained to lie withing $Y_s(x_s)$ or $Y_s(z)$. Thus, strictly speaking, $z\in \mathbb {X}$ and $w_s \in \mathbb {Y}$ vary freely within their respective spaces. Denote $\varvec{w}:=(w_1,\dots ,w_{\vert S \vert })$ and similarly for $(\varvec{x},\varvec{y})$ and when needed we denote $\varvec{z}:= (z,\dots ,z)\in \mathbb {R}^{n \times \vert S \vert }$.

Since the second-stage non-anticipativity variables are independent for each outcome scenario and otherwise unconstrained, the non-anticipativity constraints $w_s - y_s = 0$, $x \in S$, have no practical effect on the feasibility of the second-stage decisions $y_s$ in SMIP (1). Nevertheless, this formulation aids the subsequent analysis by allowing the incorporation of all variables (regardless of stage) into regularisation terms in a symmetric fashion. The second stage feasibility is propagated from $\varvec{y}$ to $\varvec{w}$ via this constraint, while $\varvec{w}$ remains unconstrained. The formulation (1) is also conducive to generalising our results for two-stage SMIPs to multi-stage problems in which all stages except the last have active non-anticipativity constraints. In the practical application of developed algorithms to two-stage problems, the use of $\varvec{w}$ may be suppressed, as it is in the description of the computational experiments of Sect. 6.

Throughout our developments, we assume the following assumptions to hold regarding our SMIP (1). We explicitly assume the existence of an optimal solution, which could be replaced by the standard assumption of rationality of the data defining the problem.

Assumption 1

We make the following standard SMIP assumptions:

1.
Stochasticity of $p_s$: for each $s \in S$, we have $p_s > 0$ and $\sum _{s \in S} p_s = 1$.
2.
Non-emptiness: $K_s$, $s \in S$, is a non-empty set of feasible decisions constructed with linear constraints and integrality constraints on the $x_s$ and $y_s$ variables. (This also implies that $K_s$ is closed.)
3.
Boundedness and Optimality: The optimal value of the SMIP (1) is bounded from below. Also, the feasible sets $K_s$, $s \in S$, are bounded. Furthermore, $\zeta ^{SMIP}$ is feasible and possesses an optimal solution.
4.
Relatively complete recourse: The SMIP model has relatively complete recourse; $\forall x \in X,\, \forall s \in S$, we have $Y_s(x) \ne \emptyset $: that is, first-stage decisions $x$ that satisfy the first-stage specific constraints $x\in X$ have at least one second-stage decision solution $\left( y_s\right) _{s \in S}$ for which $(x,y_s) \in K_s$ for all scenarios $s \in S$.

Of interest is the dual function $\zeta :\Lambda \times \mathbb {R}_{>0}^{|S |} \times \mathbb {R}_{>0} \rightarrow \mathbb {R}_{+\infty }$ defined by

$$\begin{aligned} \zeta (\lambda ,\pi ,\rho ) := \min \{\varphi ^{\lambda ,\rho ,\pi }\left( z, \varvec{w}\right) \mid (z,\varvec{w}) \in {\mathbb {X}}\times \mathbb {Y}^{\small {\vert S \vert }}\}, \end{aligned}$$

(2)

where

$$\begin{aligned} \varphi ^{\lambda ,\rho ,\pi }\left( z, \varvec{w}\right)&:=\sum _{s\in S} \varphi _s^{\lambda ,\rho ,\pi }{(z, w_s)} \text { and } \Phi ^{\lambda ,\rho ,\pi }(z,\varvec{w}) := \prod _{s \in S} \Phi _s^{\lambda ,\rho ,\pi }(z,w_s) \end{aligned}$$

(3a)

with, for each $s \in S$,

$$\begin{aligned} \varphi _s^{\lambda ,\rho ,\pi }\left( z, w_s \right)&:= \min _{\left( x_{s},y_{s}\right) \in K_s } f_s(x_s,y_s) - \lambda _s^\top \, (z-x_s) + \rho \pi _s \psi \left( z-x_{s},w_{s}-y_{s} \right) \end{aligned}$$

(3b)

$$\begin{aligned} \Phi _s^{\lambda ,\rho ,\pi }\left( z, w_s \right)&:= \mathop {{{\,\textrm{argmin}\,}}}\limits _{\left( x_{s},y_{s}\right) \in K_s } f_s(x_s,y_s) - \lambda _s^\top \, (z-x_s) + \rho \pi _s \psi \left( z-x_{s},w_{s}-y_{s} \right) . \end{aligned}$$

(3c)

We assume that $K_s$ is defined as in (1d), and that the usual dual feasibility $\lambda \in \Lambda :=\{\lambda \mid \sum _{s \in S} \lambda _s = 0\}$ holds. For each scenario $s \in S$, the penalty function $\psi $ output value is scaled by a penalty scaling parameter $\rho > 0$, and scenario $s \in S$ specific penalty weighting parameters $\pi _s >0 $ (for which $\sum _{s \in S} \pi _s =1$) to be specified. Note that under the assumption that $\lambda \in \Lambda $, the summation $\sum _{s \in S} \lambda _s z$ conveniently vanishes and so these terms may be dropped in subsequent developments.

Each instance of problem (2) is a continuous optimisation problem over the space ${\mathbb {X}}\times \mathbb {Y}^{\small {\vert S \vert }}$, and for nontrivial instances of SMIP (1), $\varphi ^{\lambda ,\rho ,\pi }$ is nonconvex with multiple isolated local minima. Under assumptions in [15, 16], we have $\zeta ^{SMIP} = \zeta (\lambda ,\pi ,\rho )$ for sufficiently large but finite $\rho $. Properties of locally optimal solutions to the minimisation of $\varphi ^{\lambda ,\rho ,\pi }$, and how these local minimisers relate to the solutions to the original SMIP (1), are of special interest in this paper’s subsequent analysis.

As mentioned earlier, the nonsmoothness of penalty functions $\psi $ that support the exact penalty properties discussed in [15, 16] prevents the support of convergence theory provided by Gauss–Seidel approaches. For this reason, we modify the properties assumed in [15, 16] for the penalty function $\psi $ to the conditions stated in Assumption 2. In particular, we assume that the penalty is strongly convex and differentiable from the outset (departing markedly from [15, 16]), as this is required for a Gauss–Seidel approach to be applied with desirable convergence properties (see Lemma 21).

Assumption 2

For our smooth penalty function $\psi : \mathbb {X}\times \mathbb {Y}\rightarrow \mathbb {R}$, we make the following integer compatible regularisation function (ICRF) assumptions:

1.
$\psi \left( u,v\right) \ge 0$ for all $(u,v)$ and $(u,v) = 0 $ if and only if $\psi \left( u, v\right) =0$.
2.
If $\gamma \in [0,1)$ then $\psi \left( \gamma u, v\right) < \psi \left( u, v\right) $ for all $u\ne 0$ and $\psi \left( u, \gamma v\right) < \psi \left( u, v\right) $ for all $v\ne 0$.
3.
Strong convexity holds with modulus $m > 0$ i.e.
$$\begin{aligned} \psi (u,v) \ge \psi (u^0,v^0) + \left\langle \nabla \psi (u^0,v^0), \left[ \begin{array}{c}u- u^0 \\ v-v^0 \end{array}\right] \right\rangle + \frac{m}{2} \left\| \left[ \begin{array}{c}u- u^0 \\ v-v^0 \end{array}\right] \right\| ^2. \end{aligned}$$
(4)

We note that Assumption 2 implies $(0,0) = \nabla \psi (0,0)$ and thus (4) implies

$$\begin{aligned} \psi (u,v) \ge \frac{m}{2} \Vert (u,v)\Vert ^2, \;\text {for all discrepancies}\;(u,v). \end{aligned}$$

(5)

Remark 1

In the theoretical development, we partition the discrepancies into $u$ and $v$ components to correspond to the special treatment of early-stage variables against late-stage variables. For a two-stage problem, $u$ corresponds to first-stage discrepancies, and $v$ corresponds to second-stage discrepancies. To allow for versatility in how the theoretical development informs algorithmic approaches, especially for application to multi-stage problems, we carry the development with the distinction between $u$ and $v$ discrepancies through Sect. 5.

Remark 2

In our computational developments in Sect. 6, we use a weighted squared 2-norm penalty function $\psi (u,v) = \frac{1}{2} \left( \sum _{i=1}^n \left[ \bar{\mu }_iu_i^2\right] + \Vert v\Vert ^2\right) $ with weights $\bar{\mu }_i>0$, $i=1,\dots ,n$. In general, the strong convexity with modulus m is equivalent to the convexity of the function $(u,v) \mapsto \psi (u,v) - \frac{m}{2} \Vert (u,v) \Vert ^2$. For the algorithmic manifestation as presented in Sect. 6, $v$ may furthermore be set identically to value zero.

2.1 Preliminary application of Gauss–Seidel iterations

We define the following notation based on the assumption that the Lagrange multipliers $\lambda ^n\in \Lambda $ and the penalty parameters $\rho ^n>0$, $\pi ^n> 0$, $\sum _{s \in S} \pi _s^n = 1$, vary with each iteration $n\ge 0$:

$$\begin{aligned} \varphi ^n(z,\varvec{w}) := \sum _{s \in S} \varphi _s^n(z,w_s) \quad \text {where}\quad \varphi _s^n(z, w_s) := \varphi ^{\lambda ^n , \rho ^n, \pi ^n }_s(z, w_s). \end{aligned}$$

(6)

One iterative solution approach for finding locally optimal solutions for SMIP (1) starting with initial $z^0 \in {\mathbb {X}}$ is based on Gauss–Seidel (GS) iterations $n\ge 0$ of the form

$$\begin{aligned} w_s^{n+1}&\leftarrow \mathop {{{\,\textrm{argmin}\,}}}\limits _{w\in \mathbb {Y}} \varphi _s^n\left( z^{n}, w\right) \quad \text {for all}\quad s \in S, \end{aligned}$$

(7a)

$$\begin{aligned} z^{n+1}&\leftarrow \mathop {{{\,\textrm{argmin}\,}}}\limits _{z\in {\mathbb {X}}} \varphi ^n\left( z, \varvec{w}^{n+1} \right) . \end{aligned}$$

(7b)

The $z$ update (7b) is not easily computable, but the $w$ update (7a) is so, as demonstrated in the following proposition.

Proposition 3

Let $(z,w) \in \mathbb {X}\times \mathbb {Y}$.

1.
For each $s \in S$, $w\in {{\,\textrm{argmin}\,}}_{w'} \varphi _s^{\lambda ,\rho ,\pi }( z,w' )$ implies $w\in {{\,\textrm{proj}\,}}_{\mathbb {Y}}\left( \Phi _s^{\lambda ,\rho ,\pi }(z,w)\right) $.
2.
Moreover, given $z^n$, $\varvec{w}^{n+1} \in {{\,\textrm{argmin}\,}}_{w'} \varphi ^n( z^n,\varvec{w}' )$ may be computed by solving for each $s \in S$
$$\begin{aligned} \hspace{-0.7cm}(x_s^{n+1},y_s^{n+1}){} & {} \in \arg \min _{ (x_{s},y_{s}) \in K_s } f_s(x_s,y_s) +(\lambda _s^n)^\top x_s +\rho ^n\, \pi _s^n\psi \left( z^{n}-x_s,0\right) \end{aligned}$$
(8)
and then setting $\varvec{w}^{n+1}=\varvec{y}^{n+1}$.

Proof

We show the contrapositive. Assume for some $s \in S$ that $w\in {{\,\textrm{argmin}\,}}_{w'} \varphi _s^{\lambda ,\rho ,\pi }( z,w' )$, but that $w\notin {{\,\textrm{proj}\,}}_{\mathbb {Y}} \left( \Phi _s^{\lambda ,\rho ,\pi }(z,w)\right) $. Then as $(x_s,y_s) \in \Phi _s^{\lambda ,\rho ,\pi }(z,w)$ with $w_s \ne y_s$, we have

$$\begin{aligned} \varphi _s^{\lambda ,\rho ,\pi }\left( z, w_s \right)&= f_s(x_s ,y_s ) + (\lambda _s )^\top x_s+ \rho \pi _s \psi \left( z- x_s , w_{s} - y_{s} \right) \\&> f_s(x_{s} ,y_{s} ) + (\lambda _s )^\top x_s + \rho \pi _s \psi \left( z- x_s , 0 \right) \quad \text {(due to Assumption}~(2))\\&= f_s(x_{s} ,y_{s} ) + (\lambda _s )^\top x_s +\rho \pi _s \psi \left( z- x_s , y_s - y_s \right) \\&\ge \min _{\left( x_{s}',y_{s}'\right) \in K_s} \left\{ f_s(x_{s}',y_{s}') + (\lambda _s )^\top x_s' + \rho \pi _s \psi \left( z- x_s' , y_s - y_s' \right) \right\} \\&=\varphi _s^{\lambda ,\rho ,\pi }{\left( z, y_s \right) } \end{aligned}$$

which implies the contradiction that $w_s \notin {{\,\textrm{argmin}\,}}_{w} \varphi _s^{\lambda ,\rho ,\pi }\left( z, w_s \right) $. To show the claim of Part 2, assume that $\varvec{w}^{n+1}$ computed from (8) does not satisfy $w_s^{n+1} \in {{\,\textrm{argmin}\,}}_{w'} \varphi _s^n( z^n,w' )$ for at least one $s \in S$. Let $\acute{w}_s^{n+1} \in {{\,\textrm{argmin}\,}}_{w'} \varphi _s^n( z^n,w' )$. By Part 1, there exists $\acute{x}_s^{n+1}$ with $(\acute{x}_s^{n+1}, \acute{w}_s^{n+1}) \in \Phi _s^n(z^n,\acute{w}_s^{n+1})$ such that

$$\begin{aligned}&f_s(\acute{x}_s^{n+1},y_s^{n+1}) +(\lambda _s^n)^\top \acute{x}_s^{n+1} +\rho ^n\, \pi _s^n\psi \left( z^{n}-\acute{x}_s^{n+1},0\right) \\&\quad <f_s({x}_s^{n+1},y_s^{n+1}) +(\lambda _s^n)^\top {x}_s^{n+1} +\rho ^n\, \pi _s^n\psi \left( z^{n}-{x}_s^{n+1},0\right) , \end{aligned}$$

which would contradict the optimality in (8). $\square $

Computing the update $z^{n+1} \in \arg \min _{z} \varphi ^n(z,\varvec{w}^{n+1})$ given fixed $\varvec{w}^{n+1}$ corresponds to an infimal convolution of $(x_s,y_s) \mapsto f_s(x_s,y_s) + \delta _{K_s} (x_s,y_s) + (\lambda _s^{n})^\top x_s$ and $(u,v) \mapsto \rho ^n\,\pi _s^{n} \psi (u,v)$, for each $s \in S$, where we denote the indicator function of a set $K_s$ by $\delta _{K_s} (x,y)$ that takes the value zero if $(x,y) \in K_s$ and $+\infty $ otherwise. The infimal convolution is well-studied [13, Chapter 1, section H], and later we make use of certain convex “cross-sections" of this infimal convolution. However, the calculation culminating in $z^{n+1} \in \arg \min _{z} \varphi ^n(z,\varvec{w}^{n+1})$ is still not easily computable, as it requires the solution of a MIP of comparable difficulty to the original SMIP (1). Nevertheless, this problem $z^{n+1} \in \arg \min _{z} \varphi ^n(z,\varvec{w}^{n+1})$ is useful from a theoretical standpoint, as it links the consensus problem to the Gauss–Seidel step of the continuous regularisation.

A more practical approach to the $z$ update takes the form of descent steps using the usual consensus update, i.e.

$$\begin{aligned} z^{n+1}\in \arg \min _{z\in {\mathbb {X}}}\sum _{s\in S} \pi _{s}^n\psi \left( z-x_{s}^{n+1},0\right) , \end{aligned}$$

(9)

where $\varvec{w}^{n+1}-\varvec{y}^{n+1}=0$ follows from Proposition 3. From Assumption 2(3) with $u_s^0 = z^{n+1}-x_s^{n+1}$, $u_s= z-x_s^{n+1}$ and $v_s^0=v_s=0$ for all $s \in S$, and the optimality condition associated with the $z^{n+1}$ update $\sum _{s\in S}\pi _{s}^n\nabla _z\psi (z^{n+1} - x_s^{n+1},0) =0$ we have that

$$\begin{aligned} \sum _{s\in S}\pi _{s}^n\psi \left( z-x_{s}^{n+1},0\right) \ge \sum _{s\in S}\pi _{s}^n\psi \left( z^{n+1}-x_{s}^{n+1},0\right) + \frac{m}{2} \Vert z - z^{n+1} \Vert ^2 \end{aligned}$$

(10)

and so the $z^{n+1}$ update (9) must be unique.

Using this observation, we can devise a Gauss–Seidel algorithm that is guaranteed to produce non-ascent steps while the stabilisation $(z^{n}, \varvec{w}^{n+1})=(z^{n+1}, \varvec{w}^{n+1} )$ is not achieved, which is given in Algorithm 1.

Algorithm 1 describes a two-block Gauss–Seidel iterative approach on the two blocks $(x_s,y_s,w_s)_{s \in S}$ and $z$, where the mixed-integer constraints only appear in the block $(x_s,y_s,w_s)_{s \in S}$ subproblem implicitly referred to in Lines 4–5 of Algorithm 1. In the following sections, we analyse the convergence properties of certain embedded subsequences of (mid-)iterations $(x^{n_k+1},y^{n_k+1},w^{n_k+1},z^{n_k})$ generated by Algorithm 1 for penalty coefficient values $\rho ^n> 0$, penalty weights $\pi _s^n$, $s \in S$, and Lagrangian multipliers $\lambda _s^n$, $s \in S$, that vary with iteration $n\ge 1$. (It is convenient to maintain that $\sum _{s \in S} \lambda _s^n= 0$ for all $s \in S$ and all $n\ge 0$.) We must also assume the solution $(x^{n_k+1},y^{n_k+1},w^{n_k+1})$ is globally optimal in order to carry out our convergence analysis in Sect. 5.1. Here we assume the existence of Fréchet subdifferentials at the minimising points, and this is assured for any global minima. Furthermore, when $\psi $ is a quadratic form, a global minimum may be computed in practice using a MIQP solver.

Remark 3

The classical progressive hedging algorithm is realised by taking $\pi _s^n = p_s$ and so $\rho ^n = \sum _{s \in S} p_s \rho ^n$ (as $\sum _{s \in S} p_s^n =1 $). Then for $\psi (\cdot ,0) = \frac{1}{2} \Vert \cdot \Vert ^2$ we have $z^{n+1} = \sum _{s \in S} p_s x_{s}^{n+1}$ and assuming dual feasibility $\sum _{s \in S} \lambda _s^n = 0$ one can also assert for all $s \in S$ that

$$\begin{aligned} \left( x_{s}^{n+1},y_{s}^{n+1}\right) \in \Phi _s^n(z^n,w^{n+1}) {=} \mathop {{{\,\textrm{argmin}\,}}}\limits _{(x_{s},y_{s}) \in K_s } f_s(x_s,y_s){+}(\lambda _s^{n})^\top x_s {+} \rho ^np_s \psi \left( z^{n}-x_{s}, 0\right) \end{aligned}$$

along with the multiplier update that retains dual feasibility of the multipliers i.e. $\lambda _s^{n+1} = \lambda _s^n - p_s \rho ^n (z^{n+1} - x_s^{n+1})$. Moreover, the dual feasibility allows one to assert that the same $z^{n+1}$ solves the minimisation with respect to $z$ in the full augmented Lagrangian. Penalties $\rho ^n$ between iterations with progressive hedging are usually left unchanged or are updated in such a manner as to realise stabilisation. (See, e.g., [12, Section 3.4.1].)

Next, we build on that development where Algorithm 1 is viewed as an approximate two-block GS iterative approach within the continuous optimisation framework of successively minimising $\varphi ^n$ in $z$ (approximately) and $w$ (globally and exactly) with Lagrange multipliers $\lambda ^n\in \Lambda $, penalty coefficient values $\rho ^n> 0$ and penalty weights $\pi _s^n$, $s \in S$, varying between iterations $n\ge 0$ under certain assumptions.

We conclude this section by noting that the above algorithm is essentially that of [11], with alterations to the Lagrangian multiplier and penalty parameter updates. In particular, we consider what happens when Lagrange multipliers $\lambda ^n\in \Lambda $ and penalty weights $\pi ^n$ stop changing after a finite number of iterations, while penalty parameters $\{\rho ^n\}$ may increase without bound. The latter feature requires us to consider the limiting behaviour of the regularisation $\varphi ^n$ as $\rho ^n\rightarrow \infty $. Such an analysis is facilitated by analysing the level curves of the sequence of functions, denoted by ${\text {lev}}_c \varphi ^{\lambda ,\rho ,\pi }:=$ $\{ (z,w) \mid \varphi ^{\lambda ,\rho ,\pi }(z,w ) \le c\}= \{ (z,w) \mid \frac{1}{\rho }\varphi ^{\lambda ,\rho ,\pi }(z,w ) \le \frac{c}{\rho }\}= {\text {lev}}_{ \frac{c}{\rho }}\varphi ^{\lambda ,\rho ,\pi }, $ prompting the use of epi-convergence as a tool in our analysis as this is associated with the convergence of level sets.

3 Properties of the SMIP regularisation $\varphi ^{\lambda ,\rho ,\pi }$

The continuous regularisation $\varphi ^{\lambda ,\rho ,\pi }$ of SMIP (1) has properties that allow for feasible points of SMIP (1) to be associated with certain local minima of $\varphi ^{\lambda ,\rho ,\pi }$. To gain insight into these properties of $\varphi ^{\lambda ,\rho ,\pi }$, we first note some additional properties of $\psi $ that follow from the properties listed in Assumption 2.

Proposition 4

Assume $\psi $ satisfies Assumption 2. Then, for all $(z,\varvec{w}) \in {\mathbb {X}}\times \mathbb {Y}^{\small {\vert S \vert }}$, $\rho > 0$, $\pi _s > 0$, $s \in S$, and $\lambda \in \Lambda $, the following properties hold for each $s \in S$:

1.
The set of solutions for problem (3), that is $\Phi _s^{\lambda ,\rho ,\pi }(z,w)$, is non-empty.
2.
The function $\rho \mapsto \varphi _s^{\lambda ,\rho ,\pi }\left( z, w_s \right) $ is non-decreasing.
3.
If in addition $(z,w_s) \in K_s$, then $\varphi _s^{\lambda ,\rho ,\pi }\left( z, w_s \right) \le f_s\left( z,w_s \right) . $ If $(\varvec{z},\varvec{w}) \in K:= \Pi _{s \in S} K_s$ (with $\varvec{z}:=(z,z,\dots ,z)$) then
$$\begin{aligned} \varphi ^{\lambda ,\rho ,\pi }\left( z,\varvec{w}\right) = \sum _{s\in S} \varphi _s^{\lambda ,\rho ,\pi }{(z,w_s)} \le \sum _{s\in S} f_s\left( z,w_s \right) < +\infty . \end{aligned}$$
(11)
4.
The function $(z,w_s) \mapsto \varphi _s^{\lambda ,\rho ,\pi }(z,w_s)$ is locally Lipschitz continuous over
$$\begin{aligned} K_s^\Delta :=\{ (z,w_s) \in {\mathbb {X}}\times {\mathbb {Y}}_s \; : \; (z,w_s) \in {{\,\textrm{conv}\,}}(K_s) + B_\Delta (0,0) \}, \end{aligned}$$
(12)
with modulus $\pi _s \rho L_s^\Delta $, where $L_s^\Delta $ depends on the diameter of ${{\,\textrm{conv}\,}}( K_s ) + B_\Delta (0,0)$. Taking $L^\Delta := \max \{ L_s^\Delta \}$, we also have $\varphi ^{\lambda ,\rho ,\pi }$ is Lipschitz continuous with modulus $\rho L^\Delta $.

Proof

See Appendix A. $\square $

Definition 1

Denote ${{\,\textrm{proj}\,}}_{{\mathcal {I}}}(\cdot ):= {\text {proj}}_{\mathbb {X},\mathcal {I}} \left( \cdot \right) \times {\text {proj}}_{\mathbb {Y},\mathcal {I}} \left( \cdot \right) $, integer-component projection and, for each $(\bar{\varvec{x}}_{\mathcal {I}},\bar{\varvec{y}}_{\mathcal {I}}) \in {{{\,\textrm{proj}\,}}}_{{\mathcal {I}}}(K):= \Pi _{s \in S} {{{\,\textrm{proj}\,}}}_{{\mathcal {I}}}(K_s)$, denote

$$\begin{aligned} K_s^{(\bar{\varvec{x}}_{\mathcal {I}},\bar{\varvec{y}}_{\mathcal {I}})}:= \{(x_s,y_s) \in K_s \mid (\bar{\varvec{x}}_{\mathcal {I}},\bar{\varvec{y}}_{\mathcal {I}}) = {{\,\textrm{proj}\,}}{}_{{\mathcal {I}}} (\varvec{x}, \varvec{y}) \}. \end{aligned}$$

Note that this corresponds to a polyhedral subset of $K_s$ once we have removed the integrality constraint by fixing the integer variables at a specific integer value. We now consider the behaviour of $\varphi ^{\lambda ,\rho ,\pi }$ within neighbourhoods having progressively additional structure imposed. In preparation thereof, we introduce notation that facilitates the view of $\varphi ^{\lambda ,\rho ,\pi }$ in terms of its finitely numbered specific cross-sections over $(\varvec{x}_{\mathcal {I}},\varvec{y}_{\mathcal {I}}) \in {{\,\textrm{proj}\,}}_{{\mathcal {I}}}(K)$.

Definition 1 induces the following notation for proximal cross-sections for each $(\varvec{x}_{{\mathcal {I}}}^0,\varvec{y}_{{\mathcal {I}}}^0) \in \text {proj}_{{\mathcal {I}}}(K)$.

Definition 2

For $(\lambda , \rho , \pi ) \in \Lambda \times \mathbb {R}_{>0} \times \mathbb {R}_{>0}^{|S |} $, the proximal cross-sectional values are defined by

$$\begin{aligned} \varphi _s^{\lambda ,\rho ,\pi }\left( z, w_s \mid \varvec{x}_{{\mathcal {I}}}^0,\varvec{y}_{{\mathcal {I}}}^0\right) :=&\inf _{(x_s,y_s)} \{ {f_s(x_s,y_s) + \lambda _s^\top x_s} + \delta _{K_s^{ ({\textbf{x}}^{0}_{{\mathcal {I}}},{\textbf{y}}^{0}_{{\mathcal {I}}})} } (x_s,y_s) \nonumber \\&+ \rho \pi _s\psi (z-x_s,w_s-y_s) \}. \end{aligned}$$

(13a)

$$\begin{aligned} \varphi ^{\lambda ,\rho ,\pi }\left( z, \varvec{w} \mid \varvec{x}_{\mathcal {I}}^0,\varvec{y}_{\mathcal {I}}^0\right)&:= \sum _{s\in S} \varphi _s^{\lambda ,\rho ,\pi }\left( z, w_s \mid \varvec{x}_{{\mathcal {I}}}^0,\varvec{y}_{{\mathcal {I}}}^0\right) , \end{aligned}$$

(13b)

and the the set of arguments realising the proximal cross-sectional values are defined by

$$\begin{aligned} \Phi _s^{\lambda ,\rho ,\pi }\left( z, w_s \mid \varvec{x}_{{\mathcal {I}}}^0,\varvec{y}_{{\mathcal {I}}}^0\right)&:= \mathop {{{\,\textrm{argmin}\,}}}\limits _{x,y}{f_s(x_s,y_s) + \lambda _s^\top x_s} +\delta _{K_s^{ ({\textbf{x}}^{0}_{{\mathcal {I}}},{\textbf{y}}^{0}_{{\mathcal {I}}})} } (x_s,y_s) \nonumber \\&\left. \quad + \rho \pi _s\psi (z-x_s,w_s-y_s) \right\} . \end{aligned}$$

(14a)

$$\begin{aligned} \Phi ^{\lambda ,\rho ,\pi }\left( z, \varvec{w} \mid \varvec{x}_{\mathcal {I}}^0,\varvec{y}_{\mathcal {I}}^0\right)&:=\prod _{s\in S} \Phi _s^{\lambda ,\rho ,\pi }\left( z, w_s \mid \varvec{x}_{{\mathcal {I}}}^0,\varvec{y}_{{\mathcal {I}}}^0\right) . \end{aligned}$$

(14b)

For each $s \in S$ and $(x_{s,{\mathcal {I}}},y_{s,{\mathcal {I}}}) \in {{\,\textrm{proj}\,}}_{\mathcal {I}}(K_s)$, the properties of Assumption 2 for $\psi $ allow for the following properties of the cross-sections to be established.

Lemma 5

For $(x_{{\mathcal {I}}}^0,y_{{\mathcal {I}}}^0) \in {{\,\textrm{proj}\,}}_{{\mathcal {I}}}(K)$ the mapping $(z,w_s) \mapsto \varphi _s^{\lambda ,\rho ,\pi }\left( z,w_s \mid \varvec{x}_{{\mathcal {I}}}^0,\varvec{y}_{{\mathcal {I}}}^0\right) $ is convex over $\mathbb {R}^{n}\times \mathbb {R}^{m}$ for each $s \in S$.

Proof

This function can be represented as the infimal convolution of two closed, convex functions

$$\begin{aligned} \left( x_s,y_s\right)\mapsto & {} f_s (x_s,y_s) +\lambda _s^\top x_s + \delta _{K_s^{ ({\textbf{x}}^{0}_{{\mathcal {I}}},{\textbf{y}}^{0}_{{\mathcal {I}}})} } (x_s,y_s) \nonumber \\ \left( u_s,v_s \right)\mapsto & {} \rho \pi _s \psi \left( u_{s},v_{s}\right) \end{aligned}$$

with $(z,w_s) = (x_s,y_s) + (u_s,v_s)$. The compactness of $K_s$ ensures that of $K_s^{ ({\textbf{x}}^{0}_{{\mathcal {I}}},{\textbf{y}}^{0}_{{\mathcal {I}}})} $, which in turn ensures that the infimal convolution is bounded away from $-\infty $. As the strict epi-graph of an infimal convolution equals the sum of strict epi-graphs of the constituent functions, convexity follows [13, Exercise 1.28]. $\square $

Note that $\varphi _s^{\lambda ,\rho ,\pi }(z,w_s) = \min _{(x_{s,{\mathcal {I}}},y_{s,{\mathcal {I}}}) \in {{\,\textrm{proj}\,}}_{\mathcal {I}}(K_s)} \varphi _s^{\lambda ,\rho ,\pi }\left( z,w_s \mid x_{s,{\mathcal {I}}},y_{s,{\mathcal {I}}}\right) $, for each $s \in S$, is a minimum of a finite number of convex functions, but $\varphi ^{\lambda ,\rho ,\pi }$ itself is not guaranteed to be convex or differentiable on its entire domain $\mathbb {X}\times \mathbb {Y}^{\small {\vert S \vert }}$. Nevertheless, $\varphi _s^{\lambda ,\rho ,\pi }$ is locally convex and differentiable on open neighbourhoods N where, for all $(z,w_s) \in N$, $\varphi _s^{\lambda ,\rho ,\pi }(z,w_s)= \varphi _s^{\lambda ,\rho ,\pi }\left( z,w_s \mid x_{s,{\mathcal {I}}},y_{s,{\mathcal {I}}}\right) $ holds for exactly one $(x_{s,{\mathcal {I}}},y_{s,{\mathcal {I}}}) \in {{\,\textrm{proj}\,}}_{\mathcal {I}}(K_s)$.

Lemma 6

Assume $\psi $ satisfies the Assumption 2, with parameter m as in Assumption 2(3). For each fixed $D > 0$, there exists a $\tilde{\delta }>0$ such that if a discrepancy $(u^0,v^0)$ satisfies $\Vert (u^0,v^0)\Vert < \tilde{\delta }$, then $\psi (u^0,v^0) < \psi (u,v)$ for all discrepancies $(u,v)$ satisfying $\Vert (u-u^0,v-v^0)\Vert > D$.

Proof

See Appendix A. $\square $

For $\psi $ defined by $\psi (u,v) = \frac{1}{2}\left\| (u,v) \right\| ^2$ and any fixed $D > 0$, we may identify $\tilde{\delta } = \frac{1}{2} D$ (since $m=1$ and $\nabla \psi (u,v) = (u,v)$), so that if $D=1/2$ for example, we have that for all $(u^0,v^0)$ such that $\Vert (u^0,v^0)\Vert < \frac{1}{4}$, then $\psi (u^0,v^0) < \psi (u,v)$ for all $(u,v)$ such that $\Vert (u-u^0,v-v^0\Vert > \frac{1}{2}$. This observation will have practical value in terms of separating values for different cross-sections of $\varphi $.

Proposition 7

Assume $\psi $ satisfies Assumption 2 and $(z^{0},w_{s}^{0}) \in K_{s}$ for all $s \in S$. If there is at least one scenario $s \in S$ such that $(x_{s,{\mathcal {I}}},y_{s,{\mathcal {I}}})\in {{\,\textrm{proj}\,}}_{\mathcal {I}}(K_{s})$ with $(z_{{\mathcal {I}}}^0,w_{s,{\mathcal {I}}}^0) \ne (x_{s,{\mathcal {I}}},y_{s,{\mathcal {I}}})$, then there exists a finite threshold penalty coefficient $\tilde{\rho }>0$ and a threshold $\tilde{\delta }>0$ such that for all $\rho >\tilde{\rho }$ and $0<\delta <\tilde{\delta }$, the strict inequality $\varphi _s^{\lambda ,\rho ,\pi }(z,w_{s})<\varphi _s^{\lambda ,\rho ,\pi }\left( z,w_{s} \mid x_{s,{\mathcal {I}}},y_{s,{\mathcal {I}}}\right) $ hold over $(z,w_{s})\in B_{\delta }(z^{0},w_{s}^{0})$.

Proof

Assuming for some $s\in S$ that we have $(z_{{\mathcal {I}}}^0,w_{s,{\mathcal {I}}}^0) \ne (x_{s,{\mathcal {I}}},y_{s,{\mathcal {I}}})$, then identifying $(u_s,v_s)=(z-x_s,w_s-y_s)$ and $(u_s^0,v_s^0)=(z-z^0,w_s-w_s^0)$ for each $s \in S$, we have $(u_s-u_s^0,v_s-v_s^0) = (z^0-x_s,w_s^0-y_s)$ and $\left\| (u_s-u_s^0,v_s-v_s^0) \right\| > \frac{1}{2}$ for all $(x_s,y_s) \in K_s^{(x_{\mathcal {I}},y_{\mathcal {I}})}$. Thus, using Lemma 6 with $D = \frac{1}{2}$, we have a $\tilde{\delta } > 0$ for which $(z,w_s)$ satisfying $\left\| (z-z^0,w_s-w_s^0) \right\| < \tilde{\delta }$ implies that

$$\begin{aligned} \psi (z-z^0,w_s-w_s^0) < \psi (z-x_s, w_s-y_s) \end{aligned}$$

(15)

for all $(x_s,y_s) \in K_s^{(x_{\mathcal {I}},y_{\mathcal {I}})}$ given that $(z_{{\mathcal {I}}}^0,w_{s,{\mathcal {I}}}^0) \ne (x_{s,{\mathcal {I}}},y_{s,{\mathcal {I}}})$.

Due to the compactness of $K_s^{(x_{\mathcal {I}},y_{\mathcal {I}})}$, defining for fixed $(z^0,w_s^0)$, $s \in S$,

$$\begin{aligned} \Delta _s(z,w_s):=\min _{(x,y)}\left\{ \psi (z-x_s, w_s-y_s)-\psi (z-z^0,w_s-w_s^0) \mid (x_s,y_s) \in K_s^{(x_{\mathcal {I}},y_{\mathcal {I}})} \right\} , \end{aligned}$$

we have $\Delta _s(z,w_s) > 0$. It follows for each $s \in S':=\left\{ s \mid (z_{{\mathcal {I}}}^0,w_{s,{\mathcal {I}}}^0) \ne (x_{s,{\mathcal {I}}},y_{s,{\mathcal {I}}})\right\} $ that $\psi (z-z^0,w_s-w_s^0) + \frac{\Delta _s(z,w_s)}{2} < \psi (z-x_s, w_s-y_s)$ for all $(x_s,y_s) \in K_s^{(x_{\mathcal {I}},y_{\mathcal {I}})}$.

Again due to the compactness of $K_s^{(x_{\mathcal {I}},y_{\mathcal {I}})}$, we have still for each $s \in S'$ that

$$\begin{aligned} \frac{f_s(z^0,w_s^0) + \lambda _s^\top z^0}{\rho }+ & {} \pi _s\psi (z-z^0,w_s-w_s^0) + \frac{\pi _s\Delta _s(z,w_s)}{2} \\{} & {} < \frac{f_s(x_s,y_s) + \lambda _s^\top x_s}{\rho } +\pi _s\psi (z-x_s, w_s-y_s) \end{aligned}$$

for all $(x_s,y_s) \in K_s^{(x_{\mathcal {I}},y_{\mathcal {I}})}$ when $\rho> \tilde{\rho } > 0$ sufficiently large. Thus, for $\left\| (z-z^0,w_s-w_s^0) \right\| <\tilde{\delta }$ and $\rho> \tilde{\rho }>0$, we have

$$\begin{aligned} \varphi _s^{\lambda ,\rho ,\pi }(z,w_{s}) < \varphi _s^{\lambda ,\rho ,\pi }\left( z,w_{s} \mid \varvec{x}_{{\mathcal {I}}},\varvec{y}_{{\mathcal {I}}}\right) \end{aligned}$$

(16)

for all $\rho >\tilde{\rho }$ and $\left( z,w_s\right) \in B_{\tilde{\delta }}(z^{0},w_{s}^{0})$ whenever $s \in S'$. (Otherwise, for $s \in S \backslash S'$, $\varphi _s^{\lambda ,\rho ,\pi }(z,w_{s}) = \varphi _s^{\lambda ,\rho ,\pi }\left( z,w_{s} \mid \varvec{x}_{{\mathcal {I}}},\varvec{y}_{{\mathcal {I}}}\right) $ holds.) Summing over $s \in S$, the same holds then for $ \varphi ^{\lambda ,\rho ,\pi }(z,\varvec{w}) < \varphi ^{\lambda ,\rho ,\pi }\left( z,\varvec{w} \mid \varvec{x}_{{\mathcal {I}}},\varvec{y}_{{\mathcal {I}}}\right) . $ $\square $

We note that for each $(z,w_s) \in B_{\tilde{\delta }} (z^0, w^0)$, with $s \in S'$, we have $\Delta _s(z,w_s)> 0$ and so the gap between the left and right hand sides of the inequality in (16) can only grow with increasing $\rho $. Recall that we seek elements of the set of feasible (non-anticipative) solutions is given by $F:= \{ (z,\varvec{w}) \mid (z,w_s) \in K_s \text { for all } s \in S \}$, which is distinct from the set K. The next result follows immediately.

Corollary 8

Assume $\psi $ satisfies Assumption 2. If $(z^{0},w^{0})\in F$, then there exists a $\tilde{\rho }>0$ and a $\tilde{\delta }>0$ such that for $\rho \ge \tilde{\rho }$ and $0<\delta <\tilde{\delta }$ we have

$$\begin{aligned} \varphi ^{\lambda ,\rho ,\pi }(z,\varvec{w})=\varphi ^{\lambda ,\rho ,\pi }\left( z,\varvec{w} \mid \varvec{z}_{{\mathcal {I}}}^{0},\varvec{w}_{{\mathcal {I}}}^{0}\right) \quad \text {for all }(z,\varvec{w})\in B_{\delta }(z^{0},\varvec{w}^{0}). \end{aligned}$$

Hence, for $(z,w_{s})\in B_{\delta }(z^{0},w_{s}^{0})$, $s\in S$, with $0<\delta <\tilde{\delta }$, the function $\varphi _s^{\lambda ,\rho ,\pi }$ coincides with a convex function for all $\rho \ge \tilde{\rho }$.

4 The theory of persistent local minima

For iteration indices $n\ge 0$, let $(\lambda ^n,\pi ^n,\rho ^n)\in \Lambda \times \mathbb {R}_{>0}^{|S |} \times \mathbb {R}_{>0} $ and define for $(\lambda ^n,\pi ^n,\rho ^n) \rightarrow _{n\rightarrow \infty } (\lambda , \pi , \infty )$:

$$\begin{aligned} f^n(\varvec{x},\varvec{y},\varvec{w},z)&:= \sum _{s\in S} f_s(x_s,y_s) + (\lambda _s^n)^\top x_s+ \rho ^n\pi _s^n\psi (z- x_s , y_s - w_s), \end{aligned}$$

(17a)

$$\begin{aligned} \varphi _s^n&:=\varphi ^{\lambda ^n,\pi ^n,\rho ^n}; \, \varphi ^n:= \varphi ^{\lambda ^n,\pi ^n,\rho ^n}; \, \Phi _s^n:= \Phi _s^{\lambda ^n,\pi ^n,\rho ^n}; \, \Phi ^n:= \Phi ^{\lambda ^n,\pi ^n,\rho ^n}, \end{aligned}$$

(17b)

$$\begin{aligned} \Phi ^{\infty } \left( z, \varvec{w}\right)&:= \prod _{s \in S} \arg \min _{\left( x_{s},y_{s}\right) } \left\{ \pi _s \psi \left( z-x_{s},w_{s}-y_{s} \right) \mid (x_s,y_s) \in K_s \right\} . \end{aligned}$$

(17c)

In this section, we consider sequences $\{(\widetilde{\varvec{x}}^n,\widetilde{\varvec{y}}^n,\widetilde{\varvec{w}}^n,\widetilde{z}^n)\}_{n=0}^\infty $ where we have $(\widetilde{\varvec{x}}^n, \widetilde{\varvec{y}}^n)\in \Phi ^n(\widetilde{z}^n,\widetilde{\varvec{w}}^n)$. Furthermore, we assume $\lim _{n\rightarrow \infty } \rho ^n= \infty $ and we single out a specific class of sequences of local minima $\{(\widetilde{z}^n,\widetilde{\varvec{w}}^n)\}_{n=0}^\infty $ for $\varphi ^n$, which we call persistent, and which we show to be closely related to the feasible points of the underlying SMIP (1). We assume the following.

Assumption 9

Solution Sequence Assumptions (SSA) on $\{(\widetilde{\varvec{x}}^n,\widetilde{\varvec{y}}^n,\widetilde{\varvec{w}}^n,\widetilde{z}^n)\}_{n=0}^\infty $ and $\{(\lambda ^n,\pi ^n,\rho ^n)\}_{n=0}^\infty $, indexed by integers $n \ge 0$:

1.
Penalty coefficients are non-decreasing $\rho ^{n+1} \ge \rho ^n> 0$ for $n\ge 0$ and increase without bound $\lim _{n\rightarrow \infty } \rho ^n= \infty $.
2.
Dual feasibility $\lambda ^n\in \Lambda $ is satisfied, and boundedness $\limsup _{n\rightarrow \infty }\Vert \lambda ^n\Vert <\infty $ holds.
3.
Each $\widetilde{z}^n$ is a local minimum of the function $z \mapsto \inf _w\varphi ^n(z, w)$.
4.
The extracted sequence $\{\widetilde{z}^n\}_{n=0}^\infty $ converges to $\overline{z}$.
5.
Each $\widetilde{\varvec{w}}_s^n$, $s \in S$, is globally optimal $\widetilde{\varvec{w}}_s^n\in \arg \min _{w} \varphi _s^n(\widetilde{z}^n,w_s)$; thus $(\widetilde{\varvec{x}}^n,\widetilde{\varvec{y}}^n,\widetilde{\varvec{w}}^n) \in \arg \min _{\varvec{w}, (\varvec{x},\varvec{y})\in K} \sum _{s \in S} f^n_s(x_s,y_s,w_s,\widetilde{z}^n)$ is globally optimal with $z=\widetilde{z}^n$ fixed.

We also consider the following assumption on penalty weights separately.

Assumption 10

Penalty Weight Assumptions (PWA): We assume that $\sum _{s \in S} \pi _s^n= 1$ and $\pi _s^n> 0$, $s \in S$. We assume in addition that the applied update rule for generating penalty weights over iterations $n\ge 0$ ensures that we have $\pi _s^n \ge \xi $, for some fixed $\xi > 0$, for all but a finite number of iterations $n\ge 0$, and for each $s \in S$ such that $\widetilde{x}^n_{s,{\mathcal {I}}} \ne \widetilde{z}_{\mathcal {I}}^n$ holds infinitely often in n. Furthermore, for $n\ge 0$ for which the set $S_n:=\{s \in S \mid \widetilde{x}^n_{s,{\mathcal {I}}} \ne \widetilde{z}_{\mathcal {I}}^n \}$ is empty, we assume the penalty weight update rule also ensures that $\pi _s^{n+1} = \pi _s^n$ for all $s \in S$.

If $S_n$ is empty for all but a finite number of iterations $n$, then consensus $\widetilde{\varvec{x}}_{s,{\mathcal {I}}}^n=\widetilde{z}_{\mathcal {I}}^n$ has been reached and the above assumption is trivially satisfied. When $S_n= \emptyset $ occurs, then $\widetilde{z}^n\in X$ and by relative complete recourse there exists $y_s \in Y_s (\widetilde{z}^n)$ for all $s \in S$ so that $(\widetilde{\varvec{z}}^n,\varvec{y})$ is feasible for SMIP (1).

In the context of Assumptions 9 and 10, we examine when it holds that $\overline{z}\in X$. Under Assumptions 9 and 10, local minimisers $\widetilde{z}^n$ of $z\mapsto \inf _{\varvec{w}} \varphi ^n(z,\varvec{w})$ can be peculiar in the sense that $\inf _{\varvec{w}} \varphi ^n(\widetilde{z}^n,\varvec{w})$ can increase without bound as $\rho ^n \rightarrow \infty $, while the maximal neighbourhoods of the local minimiser $\overline{z}$ verifying local optimality of $\widetilde{z}^n$ for $z\mapsto \inf _w\varphi ^n(z,\varvec{w})$ vanish in measure as $n\rightarrow \infty $. The local minimisers $\widetilde{z}^n$ that do not suffer from these issues are those that we wish to isolate, in that $\inf _w\varphi ^n(\widetilde{z}^n,\varvec{w})$ remains bounded at the local minimum $\widetilde{z}^n$ despite $\rho ^n \rightarrow \infty $.

Definition 3

Let $\{(\widetilde{\varvec{x}}^n,\widetilde{\varvec{y}}^n,\widetilde{\varvec{w}}^n,\widetilde{z}^n)\}_{n=0}^\infty $ and $\{(\lambda ^n,\pi ^n,\rho ^n)\}_{n=0}^\infty $ satisfy Assumption 9.

1.
The sequence $\{(\widetilde{z}^n\,\widetilde{\varvec{w}}^n)\}_{n=0}^\infty $ is persistent if $\limsup _{n\rightarrow \infty } \varphi ^n(\widetilde{z}^n,\widetilde{\varvec{w}}^n) < \infty $.
2.
If $\lim _{n\rightarrow \infty } (\widetilde{z}^n,\widetilde{\varvec{w}}^n) = (\overline{z},\overline{\varvec{w}})$, we say that $(\overline{z},\overline{\varvec{w}})$ is a persistent limit.

Remark 4

Clearly when we have a convergent sequence of local minima $\widetilde{z}^n \rightarrow \overline{z}$ for $z\mapsto \inf _{\varvec{w}} \varphi ^n(z, \varvec{w})$, $n\ge 0$, then for any $\widetilde{\varvec{w}}^n$ with $\inf _{\varvec{w}} \varphi ^n(\widetilde{z}^n, \varvec{w})= \varphi ^n(\widetilde{z}^n, \widetilde{\varvec{w}}^n)$, $n\ge 0$, we have any convergent subsequence $\{(\widetilde{z}^{n_k}, \widetilde{\varvec{w}}^{n_k})\}_{k=0}^\infty $ converging to a persistent limit $(\overline{z},\overline{\varvec{w}})$.

The subdifferential analysis of $\varphi ^{\lambda ,\rho ,\pi }$ requires addressing its nonconvexity and non-differentiability. A notion of differentiation suitable for this purpose is Fréchet subdifferentiability as defined in [13].

Definition 4

The function $\varphi : {\mathbb {X}}\times {\mathbb {Y}}\rightarrow \mathbb {R} \cup \{\infty \}$ is Fréchet subdifferentiable at $(z^0,w^0)$ if there exists a Fréchet subderivative $(\zeta ,\omega )$ such that

$$\begin{aligned} \liminf _{(z-z^0,w-w^0) \rightarrow 0} \frac{\varphi (z,w)-\varphi (z^0,w^0)-\langle (\zeta ,\omega ), (z-z^0,w-w^0) \rangle }{\left\| (z-z^0,w-w^0) \right\| } \ge 0. \end{aligned}$$

We denote the collection of all such subderivatives by $\widehat{\partial }\varphi (z^0,w^0)$, the Fréchet subdifferential of $\varphi $ at $(z^0,w^0)$. A point $(z^0,w^0)$ is Fréchet stationary point of $\varphi $ if $(0,0) \in \widehat{\partial }\varphi (z^0,w^0)$. The limiting, or Mordukhovich subdifferential, of $\varphi $ is denoted $\partial \varphi (\overline{z},\overline{w})$, where $(\bar{\zeta },\bar{\omega }) \in \partial \varphi (\overline{z},\overline{w})$ if there exists a sequence $\{(\zeta ^n,\omega ^n) \in \widehat{\partial }\varphi (z^n,w^n)\}_{n=0}^\infty $ such that $(z^n,w^n)\rightarrow (\overline{z},\overline{w})$ and $(\zeta ^n,\omega ^n) \rightarrow (\bar{\zeta },\bar{\omega })$.

The first part of the following Lemma is a modest restatement of the cited result which we shall use to deduce differentiability whenever the Fréchet subdifferential is non-empty. In the second part we obtain local minimality from stationarity for structured functions.

Lemma 11

Let $ \varphi : {\mathbb {X}}\times {\mathbb {Y}}\rightarrow \mathbb {R} _{+\infty }$ be a function defined by $ \varphi (\overline{z},\overline{w}):= \min _{i \in I} \varphi _i(\overline{z},\overline{w})$ where $ \{\varphi _i \mid {\mathbb {X}}\times {\mathbb {Y}}\rightarrow \mathbb {R}_{+\infty }\}_{i \in I}$ is a finite family of proper, convex, lower semicontinuous functions.

1.
Then
$$\begin{aligned} \widehat{\partial }\varphi (\overline{z},\overline{w}) = \bigcap _{i \in I(\overline{z},\overline{w})} \widehat{\partial }\varphi _i(\overline{z},\overline{w}) \end{aligned}$$
where $I(\overline{z},\overline{w}):= \{ i \in I \mid i \in \arg \min _{i \in I} \varphi _i(\overline{z},\overline{w})\}$. If each function $\varphi _i$, $i \in I$, is differentiable then $\widehat{\partial }\varphi (\overline{z},\overline{w}) \ne \emptyset $ implies $\widehat{\partial }\varphi (\overline{z},\overline{w}) = \{\nabla \varphi (\overline{z},\overline{w})\}$.
2.
If, in particular, Fréchet stationarity $0 \in \widehat{\partial }\varphi _i(\overline{z},\overline{w})$ is satisfied for all $i \in I(\overline{z},\overline{w})$, then $(\overline{z},\overline{w})$ is a local minimum of $\varphi $ with $(0,0) \in \widehat{\partial }\varphi (\overline{z},\overline{w})$. Moreover, if at least one of the $\varphi _i$, $i \in I(\overline{z},\overline{w})$ is differentiable then we also have $\widehat{\partial }\varphi (\overline{z},\overline{w}) = \nabla \varphi (\overline{z},\overline{w}) = 0$.

Proof

Part 1: Follows due to [23, Theorem 1 via Theorem 10].

Part 2: Due to the convexity of $\varphi _i$, $i \in I$, we have $(0,0) \in \widehat{\partial }\varphi _i(\overline{z},\overline{w})$ being a subgradient in both the Fréchet and classical sense for $i \in I(\overline{z},\overline{w})$, and furthermore, $(\overline{z},\overline{w})$ is a globally optimal solution solution for each $\varphi _i$, $i \in I(\overline{z},\overline{w})$. The membership $(0,0) \in \varphi (\overline{z},\overline{w})$ would follow immediately from Part 1. We claim that there exists $\delta >0$ such that $\varphi (\overline{z},\overline{w}) \le \varphi (z,w)$ for all $(z,w) \in B_\delta (\overline{z},\overline{w})$. For otherwise, for some $i' \notin I(\overline{z},\overline{w})$, we have for $(z,w)$ arbitrarily close to $(\overline{z},\overline{w})$ that $\varphi (\overline{z},\overline{w}) > \varphi _{i'}(z,w)$. But since $i' \notin I(\overline{z},\overline{w})$, the inequality $\varphi (\overline{z},\overline{w}) < \varphi _{i'}(\overline{z},\overline{w})$ holds, and so the lower semicontinuity of $\varphi _{i'}$ is contradicted. Thus, we have the local minimality $\varphi (z,w) \ge \varphi (\overline{z},\overline{w})$ for all $(z,w) \in B_{\delta }(\overline{z},\overline{w})$. Moreover, if $(0,0) \in \widehat{\partial }\varphi (\overline{z},\overline{w})$ and for at least one $i \in I(\overline{z},\overline{w})$ we have $\widehat{\partial }\varphi _i(\overline{z},\overline{w}) =\{(0,0)\}$, then by Part 1, we must also have $\widehat{\partial }\varphi (\overline{z},\overline{w}) = \{(0,0)\}$ and so $ \nabla \varphi (\overline{z},\overline{w})=(0,0)$ exists. $\square $

The following motivates our set of assumptions on the penalty parameter update.

Proposition 12

Assume that SMIP (1) satisfies Assumption 1. Let $\psi $ satisfy Assumption 2. Suppose we have a persistent local minima sequence $(\widetilde{z}^{n},\widetilde{\varvec{w}}^{n} ) \rightarrow (\overline{z},\overline{\varvec{w}})$ for $\rho ^n \rightarrow \infty $ (and hence Fréchet stationarity $0 \in \widehat{\partial }\varphi ^n(\widetilde{z}^{n},\widetilde{\varvec{w}}^{n})$ for each n). If the PWA Assumption 10 holds, then for n sufficiently large and for all $(\widetilde{\varvec{x}}^n,\widetilde{\varvec{y}}^n) \in \Phi ^n(\widetilde{z}^n,\widetilde{\varvec{w}}^n)$, we have $\widetilde{z}_{{\mathcal {I}}}^{n} = \widetilde{\varvec{x}}_{s,{\mathcal {I}}}^n$ for all $s \in S$ i.e. consensus holds in the integral components at a fixed value $\widetilde{z}_{{\mathcal {I}}}^{n}=\overline{z}_{{\mathcal {I}}}$, and furthermore, the Fréchet stationarity $0 \in \widehat{\partial }f^n(\widetilde{\varvec{x}}^{n},\widetilde{\varvec{y}}^{n},\widetilde{\varvec{w}}^{n},\widetilde{z}^{n})$ holds.

Proof

In general, the Fréchet stationarity $0 \in \widehat{\partial }\varphi ^n(\widetilde{z}^n,\widetilde{\varvec{w}}^n)$, $n \ge 0$, is a much stronger notion in that it allows us to deduce the Fréchet stationarity $0 \in \widehat{\partial }f^n(\widetilde{\varvec{x}}^n,\widetilde{\varvec{y}}^n,\widetilde{\varvec{w}}^n,\widetilde{z}^n)$ via standard subderivative inclusions for marginal mappings (see [13, Theorem 10.13], Lemma 11 and elsewhere). Hence the Fréchet stationarity $0 \in \widehat{\partial }\varphi ^n(\widetilde{z}^{n},\widetilde{\varvec{w}}^{n} )$ implies the cross-sectional Fréchet stationarity $0 \in \widehat{\partial }\varphi ^{n}\left( \widetilde{z}^{n},\widetilde{\varvec{w}}^{n} \mid \varvec{x}_{{\mathcal {I}}}, \varvec{y}_{{\mathcal {I}}}\right) $ for all optimal cross-sections $(\varvec{x}_{{\mathcal {I}}}, \varvec{y}_{{\mathcal {I}}}) \in \text {proj}_{{\mathcal {I}}}(\Phi ^n(\widetilde{z}^n,\widetilde{\varvec{w}}^n)) $. Identifying in terms of Lemma 11 (Part 2) $\varphi $ with $\varphi ^n$, the $\varphi _i$, $i \in I$, with the finite number of cross-sections $\varphi ^{n}\left( \cdot ,\cdot \mid \varvec{x}_{\mathcal {I}},\varvec{y}_{\mathcal {I}}\right) $ with $(\varvec{x}_{\mathcal {I}},\varvec{y}_{\mathcal {I}}) \in {{\,\textrm{proj}\,}}_{\mathcal {I}}(K)$, and $(\overline{z},\overline{\varvec{w}})$ with $(\widetilde{z}^n,\widetilde{\varvec{w}}^n)$, we have a local minimum of $\varphi ^n$ at $(\widetilde{z}^n,\widetilde{\varvec{w}}^n)$ and by the definition of $\varphi ^n$ we have the Fréchet stationarity $0 \in \widehat{\partial }f^n(\widetilde{\varvec{x}}^{n},\widetilde{\varvec{y}}^{n},\widetilde{\varvec{w}}^{n}, \widetilde{z}^{n})$. From this, we show that all such solutions have a common set of integral values for n sufficiently large. As $\{(\widetilde{z}^{n},\widetilde{\varvec{w}}^{n} )\}_{n=0}^\infty $ is persistent there exists $\kappa >0$ such that $ \varphi ^{n} (\widetilde{z}^{n},\widetilde{\varvec{w}}^{n} ) \le \kappa $ for all $n$ sufficiently large. Hence for each optimal cross-section $(\widetilde{\varvec{x}}_{{\mathcal {I}}}^n, \widetilde{\varvec{y}}_{{\mathcal {I}}}^n) \in \text {proj}_{{\mathcal {I}}}(\Phi ^n(\widetilde{z}^n,\widetilde{\varvec{w}}^n))$ we have, using (5),

$$\begin{aligned} \frac{1}{\rho ^n}&[ \kappa + (\Vert c\Vert + \Vert d\Vert + \limsup _{n' \rightarrow \infty } \Vert \lambda ^{n'}\Vert ) (\sup _{(x,y) \in K}\max \{ \Vert x\Vert , \Vert y\Vert \}) ] \nonumber \\&\quad \ge \sum _{s \in S} \pi _s^n \psi (\widetilde{z}^{n} - \widetilde{x}_s^{n}, 0 ) \ge \frac{m}{2} \sum _{s \in S} \pi _s^n \Vert \widetilde{z}_{{\mathcal {I}}}^{n} - \widetilde{x}_{s, {\mathcal {I}}}^{n}\Vert ^2 \end{aligned}$$

(18)

The left-hand side of (18) tends to zero as $\rho ^n \rightarrow \infty $ and $\pi _s^n \ge \xi $ for all $s \in S_{n}$. After choosing a small $0<\delta <\frac{1}{2\vert S \vert }$ we conclude that $\Vert \widetilde{x}_{s, {\mathcal {I}}}^{n} - z_{{\mathcal {I}}}^{n} \Vert < \frac{1}{2}$ for all $s \in S_{n}$ and so $\widetilde{x}_{s', {\mathcal {I}}}^{n} = \widetilde{x}_{s, {\mathcal {I}}}^{n}= z_{{\mathcal {I}}}^{n}$ for all $s,s' \in S_{n}$. As $z_{{\mathcal {I}}}^{n} = \widetilde{x}_{s, {\mathcal {I}}}^{n}$ for all $s \notin S_{n}$ already, we have equality for all $s \in S$ and as $\widetilde{x}_{s, {\mathcal {I}}}^{n} = \overline{x}_{s, {\mathcal {I}}} = \overline{z}_{\mathcal {I}} $ are fixed independent of $\rho ^n$ for $n$ sufficiently large. $\square $

Feasibility may also be shown to hold, as stated in Lemma 13. Furthermore, in Proposition 14 we state the relationships between persistency and feasibility.

Lemma 13

Let the problem SMIP (1) satisfy the SMIP Assumption 1, and let penalty function $\psi $ satisfy Assumption 2. If a sequence $\{(\widetilde{\varvec{x}}^n,\widetilde{\varvec{y}}^n,\widetilde{\varvec{w}}^n,\widetilde{z}^n)\}_{n=0}^\infty $ given $\{(\lambda ^n,\pi ^n,\rho ^n)\}_{n=0}^\infty $ with integer index $n \ge 0$ satisfies the Assumption 9, then $\widetilde{y}^n_s = \widetilde{w}^n_{s}\in Y_s (\widetilde{x}^n_s)$, $s \in S$, and $\widetilde{z}^n \in \arg \min _{z} \sum _{s \in S} \pi _s^n \psi (z-\widetilde{x}_s^n, 0)$. Furthermore, for each $n \ge 0$ for which $\widetilde{z}^n \in X$ holds, we have $\varphi ^n(\widetilde{z}^n,\widetilde{\varvec{w}}^n) = \inf _{\varvec{w}} \varphi ^n(\widetilde{z}^n,\varvec{w}) $ bounded from above independent of the specific value of $\rho ^n$.

Proof

It follows from Lemma 4 that for all $n \ge 0$, we have the existence of $(\widetilde{\varvec{x}}^n, \widetilde{\varvec{y}}^n)\in K$ that attains the infimum in the definition of $\varphi ^n ( \widetilde{z}^n,\widetilde{\varvec{w}}^n)$. Because $\widetilde{\varvec{w}}_s^n$, $s \in S$, is a global optimum for $w_s \mapsto \varphi _s^n(\widetilde{z}^n,w)$, the claim $\widetilde{y}_s^n=\widetilde{w}_s^n$, $s \in S$ follows readily from Proposition 3.

To establish that $\widetilde{z}^n \in \arg \min _{z} \sum _{s \in S} \pi _s^n \psi (z-\widetilde{x}_s^n, 0)$, assume that $\widetilde{z}^n \notin \arg \min _{z} \sum _{s \in S} \pi _s^n \psi (z-\widetilde{x}_s^n, 0)$. For any $\overline{z}^n\in \arg \min _{z} \sum _{s \in S} \pi _s^n \psi (z-\widetilde{x}_s^n, 0)$, using the convexity of $\psi $, and $\eta \in (0,1)$ we have

$$\begin{aligned}&\sum _{s \in S} \inf _{w} \varphi _s^n(\eta \overline{z}^n + (1-\eta )\widetilde{z}^n,w_s) \\&\quad \le \sum _{s\in S} p_{s}\left[ c^{\top } \widetilde{x}_s^n +d_{s}^{\top } \widetilde{w}_s^n \right] +(\lambda _s^n)^{\top } \widetilde{x}_{s}^n +\rho ^n \pi _s^n \psi \left( [\eta \overline{z}^n + (1-\eta ) \widetilde{z}^n] - \widetilde{x}_{s}^n, 0 \right) \\&\quad \le \eta \sum _{s\in S} p_{s}\left[ c^{\top } \widetilde{x}_s^n +d_{s}^{\top } \widetilde{w}_s^n \right] +(\lambda _s^n)^{\top } \widetilde{x}_{s}^n +\rho ^n \pi _s^n \psi \left( \overline{z}^n - \widetilde{x}_{s}^n, 0 \right) \\&\qquad + (1-\eta ) \sum _{s\in S} p_{s}\left[ c^{\top } \widetilde{x}_s^n +d_{s}^{\top } \widetilde{w}_s^n \right] + (\lambda _s^n)^{\top } \widetilde{\varvec{x}}_{s}^n +\rho ^n \pi _s^n \psi \left( \widetilde{z}^n - \widetilde{x}_{s}^n, 0 \right) \\&\quad < \varphi ^n(\widetilde{z}^n,\widetilde{\varvec{w}}^n), \end{aligned}$$

which would contradict the local optimality of $\widetilde{z}^n$ for $z\mapsto \inf _{\varvec{w}} \varphi ^n(z,\varvec{w})$. Thus, $\widetilde{z}^n \in \arg \min _{z} \sum _{s \in S} \pi _s^n \psi (z-\widetilde{x}_s^n, 0)$ for all $n \ge 0$.

Furthermore, it also follows that, when $\widetilde{z}^n \in X$ (a compact set):

$$\begin{aligned} \varphi ^n(\widetilde{z}^n, \widetilde{\varvec{w}}^n )&= \sum _{s\in S}p_{s}^n\left[ c^{\top } \widetilde{x}_{s}^n +d_{s}^{\top } \widetilde{y}_s^n \right] +(\lambda _s^n)^\top \widetilde{x}_s^n+\rho ^n \pi _s^n \psi \left( \widetilde{z}^n - \widetilde{x}_{s}^n , \widetilde{w}_s^n-\widetilde{y}_s^n \right) \\&\le \sup _n\inf _{\widetilde{\varvec{w}} \in Y(\widetilde{z}^n)} \sum _{s\in S}p_{s} [ c^{\top } \widetilde{z}^n +d_{s}^{\top } \widetilde{w}_s ] +(\lambda _s^n)^\top \widetilde{z}^n\le \Gamma < \infty . \end{aligned}$$

where, after noting that $\sum _{s \in S} (\lambda _s^n)^\top \widetilde{z}^n= 0$ vanishes due to $\lambda \in \Lambda $, we have that $\Gamma < \infty $ can be chosen to hold regardless of the specific realisations of $\rho ^n > 0$ and $\widetilde{z}^n \in X$ due to the boundedness properties of the SMIP Assumptions; the finiteness of $\Gamma $ also requires the assumed relatively complete recourse. $\square $

Proposition 14

Assume $\psi $ satisfies Assumption 2. If $(\overline{z},\overline{\varvec{w}})$ is a persistent limit for a persistent sequence $\{(\widetilde{z}^n,\widetilde{\varvec{w}}^n)\}_{n=0}^\infty $ then

1.
$(\overline{z}, \overline{\varvec{w}}) \in F$; namely $\overline{z}\in X$ and $\overline{w}_s \in Y_s (\overline{z})$;
2.
there is a fixed neighbourhood $B_{\delta }\left( \overline{z},\overline{\varvec{w}}\right) $ with $\delta > 0$ on which $(\widetilde{z}^n,\widetilde{\varvec{w}}^n)$ is locally optimal for $\varphi ^n$ for all n large enough (i.e., for all $\rho ^n$ larger than some threshold $\tilde{\rho }$).

If we furthermore assume that the PWA Assumption 10 holds, then we have $\widetilde{z}_{\mathcal {I}}^n=\widetilde{x}_{s,{\mathcal {I}}}^n$ for all $s \in S$ for all $(\widetilde{\varvec{x}}^n,\widetilde{\varvec{y}}^n) \in \Phi ^{n} (\widetilde{z}^n,\widetilde{\varvec{w}}^n)$ for $n$ large enough.

Proof

To prove (1), suppose $(\overline{z}, \overline{\varvec{w}}) \notin F$. Then there exists $\delta >0$ such that $\inf _{(x_s,y_s) \in K_s} \Vert (\overline{z}-x_s, \overline{w}_s-y_s) \Vert ^2 \ge 2\delta $ for at least one scenario $s \in S$. As $(\widetilde{z}^n, \widetilde{\varvec{w}}^n) \rightarrow (\overline{z}, \overline{\varvec{w}})$ we have, for n (and thus $\rho ^n$) large enough that, $\inf _{(x,y) \in K_s} \Vert (\widetilde{z}^n-x_s, \widetilde{w}_s^n-y_s) \Vert ^2 \ge \delta $ for some $s \in S_n$ (in which case $\widetilde{z}^n \ne \widetilde{x}^n_s$ since by Proposition 3 we have $\widetilde{\varvec{w}}^n = \widetilde{\varvec{y}}^n$). We now use Assumption 2 (3) to bound the penalty values below. By the differentiability assumed in Assumption 2 we apply the inequality (5) for each $s \in S$ to get $ \psi \left( \widetilde{z}^n-x_s, \widetilde{w}_s^n - y_s \right) \ge \frac{m}{2} \Vert (\widetilde{z}^n-x_s, \widetilde{w}_s^n-y_s) \Vert ^2. $ It follows that, as $\lim _{n \rightarrow \infty } \rho ^n = \infty $, we have $\liminf _n \pi ^n_s \ge \xi $ so

$$\begin{aligned} \min _{\begin{array}{c} (x_s, y_s) \in K_s \end{array}}{} & {} \Big \{ \rho ^n\sum _{s\in S}\pi _{s}^n \psi \left( \widetilde{z}^n-x_s, \widetilde{w}_s^n-y_s \right) \Big \} \\{} & {} \ge \frac{m\rho ^n\xi }{2} \inf _{(x_s, y_s) \in K_s} \left\{ \Vert (\widetilde{z}^n-x_s, \widetilde{w}_s^n-y_s) \Vert ^2 \right\} \ge \frac{m \rho ^n \xi }{2} \delta \end{aligned}$$

which is unbounded. This contradicts the assumption that $\left\{ \varphi ^n\left( \widetilde{z}^n,\widetilde{\varvec{w}}^n \right) \right\} _{n=0}^\infty $ is bounded above as required by the persistency assumption on $\{\widetilde{z}^n\}_{n=0}^\infty $. Thus $ (\overline{z}, \overline{\varvec{w}})\in F$.

Having shown $(\overline{z}, \overline{\varvec{w}})\in F$, it follows from definitions that $\overline{w}_s \in Y_s(\overline{z})$ and so claim (2) follows readily from Corollary 8 using the same critical $\tilde{\rho }$ and $\tilde{\delta }$ that apply regardless of the choice of $\overline{z}\in X$, and so it is established that $(z,\varvec{w}) \mapsto \varphi ^n(z,\varvec{w})$ is convex over $B_\delta (\overline{z},\overline{\varvec{w}})$ for all $\rho ^n> \tilde{\rho }$ with $n$ sufficiently large. By Remark 4 we have the same neighbourhood associated with the local minimum at $(\overline{z},\overline{\varvec{w}})$ also associated with a persistent local minimum at some $(\overline{z}, \overline{\varvec{w}})$ and thus $B_\delta (\overline{z},\overline{\varvec{w}})$ serves as the fixed neighbourhood verifying local optimality of $(\widetilde{z}^n,\widetilde{\varvec{w}}^n)$ for $(z,\varvec{w}) \mapsto \varphi ^n(z,\varvec{w})$ for each n large enough. The last claim follows from Proposition 12. $\square $

The following contains a version of the strong augmented duality result for augmented Lagrangian. Notice that this result is more general than those in [15], in that it allows for the consideration of an inexact penalty that may be differentiable everywhere. When we have pure integer variables we see that all feasible solutions are persistent and we have a stronger form of duality.

Theorem 15

Suppose problem SMIP (1) satisfies the SMIP Assumption 1 and $\psi $ is an ICRF.

1.
If problem SMIP (1) has pure integer variable in both stages and feasible point $(\overline{z}, \overline{\varvec{w}}) \in F$ satisfies $\limsup _n \varphi ^n(\overline{z}, \overline{\varvec{w}})<+\infty $ for $\lim _{n\rightarrow \infty } \rho ^n= \infty $, then $(\overline{z}, \overline{\varvec{w}}) \in F$ is a local minimum of $\varphi ^n$ for $n$ large enough.
2.
For any sequence $\{(\widetilde{z}^n, \widetilde{\varvec{w}}^n)\}_{n=0}^\infty $ of global minimisers of $\varphi ^n$ with $\lim _{n\rightarrow \infty } \rho ^n = \infty $ and $\lambda ^n$, $n\ge 0$, satisfying Assumption 9(2), and $\pi _s > 0$, $s \in S$, then its limit points $(\overline{z},\overline{\varvec{w}})$ are globally optimal solutions to SMIP (1). That is, there exists at least one globally optimal solution $(\overline{z},\overline{\varvec{w}})$ to SMIP (1) that is a persistent limit. Moreover, for any $\{ \rho ^n\}_{n=0}^\infty $ with $\rho ^n \rightarrow \infty $ there exists a persistent local minimum sequence $\{(\widetilde{z}^n,\widetilde{\varvec{w}}^n)\}_{n=0}^\infty $ for which $\lim _{n\rightarrow \infty } \varphi ^n(\widetilde{z}^n,\widetilde{\varvec{w}}^n)=\zeta ^{SMIP}$
3.
We have for any $\lambda \in \Lambda $, $\pi _s > 0$, $s \in S$,
$$\begin{aligned}{} & {} \sup _{\rho >0} \min _{\left( {z},{\varvec{w}}\right) \in \mathbb {X}\times \mathbb {Y}^{\small {\vert S \vert }}} \varphi ^{\lambda ,\rho ,\pi }\left( {z},{\varvec{w}} \right) = \zeta ^{SMIP}. \end{aligned}$$
(19)
Moreover, for a pure integer SMIP a finite value of $\bar{\rho }>0$ exists for which $\min _{\left( {z},{\varvec{w}}\right) \in \mathbb {X}\times \mathbb {Y}^{\small {\vert S \vert }}} \varphi ^{\lambda ,\rho ,\pi }\left( {z},{\varvec{w}} \right) = \zeta ^{SMIP} $ for $\rho \ge \bar{\rho }$.

Proof

1): Suppose $(\overline{z}, \overline{\varvec{w}}) \in F$. Using Lemma 7 and Corollary 8 we have a locally convex function

$$\begin{aligned} (z,\varvec{w}) \mapsto \varphi ^n(z, \varvec{w}) = \varphi ^n(z, \varvec{w}\mid \overline{z}_{\mathcal {I}}, \overline{\varvec{w}}_{\mathcal {I}}) \end{aligned}$$

for all $(z, \varvec{w}) \in B_{\delta } (\overline{z},\overline{\varvec{w}})$ for some fixed $\delta> \tilde{\delta } > 0$ and $\rho ^n> \tilde{\rho } >0$ with $n$ large enough. Moreover for all $(z^\prime , \varvec{w}^\prime ) \in F$ with $(\overline{z}_{\mathcal {I}}, \overline{\varvec{w}}_{\mathcal {I}}) \ne (z_{\mathcal {I}}^\prime , \varvec{w}_{\mathcal {I}}^\prime ) $ we have $\varphi ^n(z, \varvec{w}\mid z_{\mathcal {I}}^\prime , \varvec{w}_{\mathcal {I}}^\prime ) > \varphi ^n(z, \varvec{w}\mid \overline{z}_{\mathcal {I}}, \overline{\varvec{w}}_{\mathcal {I}})$ for $(z,\varvec{w}) \in B_{\delta } (\overline{z},\overline{\varvec{w}})$ for some fixed $\delta> \tilde{\delta } > 0$ and $\rho ^n> \tilde{\rho } >0$ with $n$ large enough. Supposing $(\overline{z}, \overline{\varvec{w}}) \in F$ is pure integer, then we have $(\overline{z}_{{\mathcal {I}}}, \overline{\varvec{w}}_{{\mathcal {I}}}) = (\overline{z}, \overline{\varvec{w}})$ and hence $(\overline{z}, \overline{\varvec{w}})$ is a local minimum of $\varphi ^n$ with $\varphi ^n(\overline{z}, \overline{\varvec{w}}) \le \sum _{s \in S} c^{\top } \overline{z}+ d_s^{\top } \overline{w}_s < +\infty $, due to the boundedness assumptions for the SMIP (and dual feasibility of any sequence $\{\lambda ^n\}_{n=0}^\infty $).

2): Let $\{(\widetilde{z}^n, \widetilde{\varvec{w}}^n)\}_{n=0}^\infty $ be a sequence where each $(\widetilde{z}^n,\widetilde{\varvec{w}}^n)$, $n\ge 0$, is a global minimiser of $\varphi ^n$ which implies $\varphi ^n(\widetilde{z}^n,\widetilde{\varvec{w}}^n) \le \zeta ^{SMIP}$, so that $\{(\widetilde{z}^n,\widetilde{\varvec{w}}^n)\}_{n=0}^\infty $ is a persistent sequence. Its limit points $(\overline{z},\overline{\varvec{w}})$ thus satisfy $(\overline{z},\overline{\varvec{w}}) \in F$ by Proposition 14. Furthermore $(\widetilde{x}_s^n,\widetilde{y}^n_s) \in \Phi ^n_s(\widetilde{z}^n,\widetilde{w}_s^n)$ and as $\Phi ^{\infty } (\overline{z}, \overline{\varvec{w}}) = \{ (\overline{z}, \overline{\varvec{w}})\}$ we have (after passing to the subsequence) $\lim _{n\rightarrow \infty } (\widetilde{x}^n_s,\widetilde{y}^n_s) = (\overline{z},\overline{w}_s) \in K_s$. (For if not, the boundedness of $\{\lambda ^n\}_{n=0}^\infty $, $\rho ^n\rightarrow \infty $, $\pi _s > 0$, $s \in S$, and the minorisation (5) would imply that $\limsup _{n\rightarrow \infty } \varphi ^n(\widetilde{z}^n,\widetilde{\varvec{w}}^n) =\infty $.) Furthermore, since $\varphi ^n(\widetilde{z}^n,\widetilde{\varvec{w}}^n) \le \zeta ^{SMIP}$, we must have also $\sum _{s \in S} f_s(\widetilde{x}_s^n,\widetilde{y}_s^n) + (\lambda _s^n)^{\top } \widetilde{x}_s^n\le \zeta ^{SMIP}$ and so $\sum _{s \in S} f_s(\overline{z},\overline{\varvec{w}}_s) = \zeta ^{SMIP}$ (due to the boundedness and dual feasibility $\lambda ^n \in \Lambda $ and $\widetilde{x}_s^n\rightarrow \overline{z}$ for all $s \in S$, we have $\limsup _{n\rightarrow \infty } \sum _{s \in S} \lambda _s^n\widetilde{x}_s^n=0$) and thus, $(\overline{z},\overline{w})$ must be optimal for the original SMIP (1).

3). Denote

$$\begin{aligned} \xi ^{SMIP}_{\rho }:= \min \{ \varphi ^{\lambda ,\rho ',\pi }\left( {z}^{\rho '},{\varvec{w}}^{\rho '} \right) \mid \{(z^{\rho '},\varvec{w}^{\rho '})\} \text { are persistent for } \rho ' \ge \rho \}. \end{aligned}$$

If $\left( {\overline{z}},{\overline{\varvec{w}}} \right) $ is a persistent limit, Lemma 14 implies $\left( {\overline{z}},{\overline{\varvec{w}}} \right) \in F$ and by Proposition 11$ \varphi ^{\lambda ,\rho ,\pi }\left( {\overline{z}},{\overline{\varvec{w}}} \right) \le \sum _{s\in S}p_{s}\left[ c^{\top }\overline{z}+d_s^{\top }\overline{w}_{s}\right] . $ It follows that:

$$\begin{aligned} \lim _{\rho \rightarrow \infty } \xi ^{SMIP}_{\rho } \le \min _{(\overline{z},\overline{\varvec{w}})}\left\{ \sum _{s\in S}p_{s}\left[ c^{\top }\overline{z}+d_{s}^{\top }\overline{w}_{s}\right] \mid (\overline{z},\overline{\varvec{w}}) \text { are persistent limits} \right\} \le \zeta ^{SMIP} , \end{aligned}$$

where the last inequality follows from the existence of global solutions that are limits of persistent local minima. Let $\rho ^n \rightarrow \infty $ and $\{(\widetilde{z}^n, \widetilde{\varvec{w}}^n) \}_{n=0}^\infty $ be a sequence of persistent local minima, globally minimising $\varphi ^n$ with $\varphi ^n(\widetilde{z}^n, \widetilde{\varvec{w}}^n) \rightarrow \zeta ^{SMIP} $. Via global optimality $\varphi ^n(\widetilde{z}^n, \widetilde{\varvec{w}}^n) \le \xi ^{SMIP}_{\rho ^n} $, from which it follows that $\sup _{\rho >0} \xi ^{SMIP}_{\rho } = \zeta ^{SMIP}$. As all global minima are eventually persistent we are finished.

When we have a pure integer SMIP then by Part 1, we have the existence of a $\bar{\rho } >0$ such that $\varphi ^{\lambda ,\rho ,\pi }({z}^{\rho },{\varvec{w}}^{\rho }) = \varphi ^{\lambda ,\rho ,\pi }({\overline{z}},{\overline{\varvec{w}}})$ for all $\rho \ge \bar{\rho }$, where $({\overline{z}},{\overline{\varvec{w}}})$ is a global minimum of the SMIP. Hence, a global minimum is achieved for a finite $\rho $. $\square $

We now investigate the role of the fixed neighbourhoods verifying the local minima for $\varphi ^n$, $n\ge 0$. Indeed, for limiting points that are not persistent, we show that such a neighbourhood does not exist.

Assumption 16

The sequence $\{(\widetilde{\varvec{x}}^n,\widetilde{\varvec{y}}^n,\widetilde{\varvec{w}}^n,\widetilde{z}^n), (\lambda ^n, \pi ^n, \rho ^n)\}_{n=0}^\infty $ satisfies the joint PH assumptions (joint PHA) when:

1.
The problem SMIP (1) satisfies the SMIP Assumption 1,
2.
The penalty function $\psi $ meets the Integer Compatibility Regularisation Functions Assumptions, (ICRF) given in Assumption 2, and
3.
the sequences $\{(\widetilde{\varvec{x}}^n,\widetilde{\varvec{y}}^n,\widetilde{\varvec{w}}^n,\widetilde{z}^n)\}_{n=0}^\infty $ and $\{(\lambda ^n, \pi ^n, \rho ^n)\}_{n=0}^\infty $ with integer index $n \ge 0$ satisfy the Solution Sequence Assumptions (SSA) in Assumption 9 and Penalty Weighting (PWA) given in Assumption 10.

Proposition 17

Assume $\{(\widetilde{\varvec{x}}^n,\widetilde{\varvec{y}}^n,\widetilde{z}^n,\widetilde{\varvec{w}}^n), (\lambda ^n,\pi ^n, \rho ^n)\}_{n=0}^\infty $ satisfies the joint PHA Assumption 16. If the radii $\delta _n$, $n \ge 0$, on which $(\widetilde{z}^n,\widetilde{\varvec{w}}^n)$ are locally optimal for $\varphi ^n$ satisfies $\liminf _{n \rightarrow \infty } \delta _n = \bar{\delta }$ for some $\bar{\delta } > 0$, then

1.
$\lim _{n\rightarrow \infty } \Vert \widetilde{z}^n-\widetilde{x}_s^n\Vert = 0$ for all $s \in S$ for which $\limsup _{n\rightarrow \infty } \pi _s^n > 0$. Thus, $\overline{z}\in X$, $\lim _{n\rightarrow \infty } \sum _{s \in S} \pi _s^n \widetilde{x}_s^n \rightarrow \overline{z}$ and for n sufficiently large, we have $\widetilde{z}_{\mathcal {I}}^n = \widetilde{x}_{s,{\mathcal {I}}}^n$ for all $s \in S$, and $\sum _{s \in S} \pi _s \widetilde{x}_s^n \in X$. (When $\psi = \frac{1}{2} \Vert \cdot \Vert ^2$ we have $\widetilde{z}^n = \sum _{s \in S} \pi _s \widetilde{x}_s^n \in X$ for all $n\ge 0$.)
2.
For all n we have $\widetilde{w}_s^{n} = \widetilde{y}^{n}_s \in Y_s (\widetilde{x}^{n}_s)$ and limit points $\overline{w}_s$ of $\{\widetilde{w}_s^n\}_{n=0}^\infty $ satisfy $\overline{w}_s \in Y_s(\overline{z})$ for all $s \in S$, with $\limsup _{n \rightarrow \infty } \pi _s >0$.

Proof

See Appendix A. $\square $

The previous analysis allows us to pose the following result which confirms that the “basis of attraction” of non-persistent local minima has no interior in the limit. The next result follows immediately as contra-positives of Proposition 17.

Corollary 18

Let $\{(\varvec{x}^n,\varvec{y}^n,\varvec{w}^n,z^n),(\lambda ^n,\pi ^n, \rho ^n)\}_{n=0}^\infty $ satisfy the joint PH Assumption 16. If any one of the following is true:

1.
$\overline{z}\notin X$,
2.
There exist arbitrarily large n such that $z_{\mathcal {I}}^n \ne x_{s,{\mathcal {I}}}^n$ for at least one $s\in S$,
3.
$\lim _{n \rightarrow \infty } x_s^n \ne \overline{z}$ or $\lim _{n \rightarrow \infty } x_s^n$ does not exist for at least one $s \in S$ for which $\limsup _{n\rightarrow \infty } \pi ^n >0$ and
4.
when $\psi = \frac{1}{2} \Vert \cdot \Vert ^2$, $z^n \not \rightarrow \overline{z}$,

then $\lim _{n \rightarrow \infty } \delta ^n = 0$ for the radii $\delta ^n > 0$, $n\ge 0$, on which the local optimality of each $\widetilde{z}^n$ for $z\mapsto \inf _w\varphi ^n(z,w)$ is verified.

Example 1

Consider an augmented Lagrangian reformulation of a simple split variable extensive form of a two-stage SMIP

$$\begin{aligned} \min _{x,y,z,w} \left\{ \begin{array}{c} \sum _{s=1}^2 \left[ c^\top x_s + d_s^\top y_s \right] + \rho \pi _s\psi (z-x_s,w_s-y_s) \mid \, (x_1,y_1) \in K_1,\, (x_2,y_2) \in K_2, \\ x_1=z,\, x_2=z,\, y_1=w_1,\, y_2=w_2, \, x_1,x_2 \in \{0,1\},\, y_1,y_2 \in [0,1] \end{array} \right\} \end{aligned}$$

(20)

with penalty coefficient $\rho > 0$ where $K_1 = \{(x,y) \in \{0,1\}\times [0,1] \mid x\le y \} \supset \{(0,0),(1,1)\}$, $K_2=\{(x,y) \in \{0,1\}\times [0,1] \mid 1-x\le y\} \supset \{(0,1),(1,0)\}$, $c=0$, $d_1=1=d_2$, and $\psi (u,v)= \Vert (u,v)\Vert ^2$ for $s=1,2$. We assume $p_1=p_2=\pi _1=\pi _2=\frac{1}{2}$, and $\lambda =0$ throughout this example. For any $\{\rho ^n\}_{n=0}^\infty $ with $\rho ^n> 0$, $n\ge 0$, one may verify that the local minimisers $\widetilde{z}^n,\widetilde{\varvec{w}}^n$ and local optimal values as parameterised by $\rho ^n$, $n\ge 0$, are as follows

The locally optimal solutions $(\widetilde{z}^n,\widetilde{\varvec{w}}^n)$ are the same for all $\rho > 0$, so that $(\widetilde{z}^n,\widetilde{\varvec{w}}^n)=(\overline{z},\overline{\varvec{w}})$ for $n\ge 0$ for each locally optimal $(\overline{z},\overline{\varvec{w}})$. Here we see that the two globally optimal solutions for $\varphi ^{\lambda ,\rho ,\pi }$ are the persistent solutions with either $\overline{z}=\overline{x}_1=\overline{x}_2=0$ or $\overline{z}=\overline{x}_1=\overline{x}_2=1$, which both satisfy non-anticipativity. The non-persistent solution has $\overline{z}=0.5$ with $0=\overline{x}_1 \ne \overline{x}_2=1$; it only stays optimal over an ever shrinking neighbourhood $B_\delta (\overline{z},\overline{\varvec{w}})$ with radius $\delta = 1 / \rho ^n$ vanishing as $\rho ^n\rightarrow \infty $.

Solution	Value	Locally Optimal Over $B_\delta (\overline{z},\overline{\varvec{w}})$
$(\widetilde{\varvec{x}}^n,\widetilde{\varvec{y}}^n,\widetilde{z}^n,\widetilde{\varvec{w}}^n)$	$\varphi ^n(\widetilde{z}^n,\widetilde{\varvec{w}}^n)$	$(\overline{z},\overline{\varvec{w}})$	$\delta $	Persistent?
$\left( \left[ \begin{array}{c} 0 \\ 0 \end{array}\right] ,\left[ \begin{array}{c} 0 \\ 1 \end{array}\right] ,0,\left[ \begin{array}{c} 0 \\ 1 \end{array}\right] \right) $	1	$\left( 0,\left[ \begin{array}{c} 0 \\ 1 \end{array}\right] \right) $	$\frac{1}{2}-\frac{1}{\rho ^n}$	Yes
$\left( \left[ \begin{array}{c} 0 \\ 1 \end{array}\right] ,\left[ \begin{array}{c} 0 \\ 0 \end{array}\right] ,\frac{1}{2},\left[ \begin{array}{c} 0 \\ 0 \end{array}\right] \right) $	$\frac{\rho ^n}{4}$	$\left( \frac{1}{2},\left[ \begin{array}{c} 0 \\ 0 \end{array}\right] \right) $	$\frac{1}{\rho ^n}$	No
$\left( \left[ \begin{array}{c} 1 \\ 1 \end{array}\right] ,\left[ \begin{array}{c} 1 \\ 0 \end{array}\right] ,1,\left[ \begin{array}{c} 1 \\ 0 \end{array}\right] \right) $	1	$\left( 1,\left[ \begin{array}{c} 1 \\ 0 \end{array}\right] \right) $	$\frac{1}{2}-\frac{1}{\rho ^n}$	Yes

5 Analysis of the block Gauss–Seidel sequence

Block Gauss–Seidel iterations are most easily analysed for differentiable optimisation problems. However, we need to perform Gauss–Seidel iterations on nonsmooth functions with varying parameterisations and, hence, we develop the necessary theory to facilitate this analysis. We start with statements of elementary definitions and properties of Gauss–Seidel iterations that apply under general assumptions on the function and their domain sets.

Definition 5

Let $G: {\mathbb {X}}\times {\mathbb {Y}}\rightarrow \mathbb {R}_{+\infty }$ be a continuous function over a closed subset of ${\mathbb {X}}\times {\mathbb {Y}}$. A solution $({z}^*,w^*) \in {\mathbb {X}}\times {\mathbb {Y}}$ is a partial minimum of G if

$$\begin{aligned} G\left( {z}^{*},w\right)&\ge G\left( {z}^{*},w^{*}\right) \quad \text {for all } w\in {\mathbb {Y}}\quad \text {and} \end{aligned}$$

(21a)

$$\begin{aligned} G\left( {z},w^{*}\right)&\ge G\left( {z}^{*},w^{*}\right) \quad \text {for all } {z}\in {\mathbb {X}}. \end{aligned}$$

(21b)

For general non-smooth G, partial minimality does not imply (joint) minimality. Under suitable assumptions of convexity and (additive) separability of non-smoothness in G, we may recover joint minimality as described in Lemma 21

Assumption 19

Separability and Convexity Assumptions (SCA) on $G: {\mathbb {X}}\times {\mathbb {Y}}\rightarrow \mathbb {R}_{+\infty }$:

1.
G is bounded from below and its level sets are bounded.
2.
G has the form $G({z},w) = Q({z},w) + h(w)$ where
1. (a)
  $Q: {\mathbb {X}}\times {\mathbb {Y}}\rightarrow \mathbb {R}$ is convex and continuously differentiable over ${\mathbb {X}}\times {\mathbb {Y}}$;
2. (b)
  $h: {\mathbb {Y}}\rightarrow \mathbb {R}_{+\infty }$ is proper, lower semicontinuous, and convex.

The following properties follow immediately from Assumption 19.

Lemma 20

Let $G: {\mathbb {X}}\times {\mathbb {Y}}\rightarrow \mathbb {R}_{+\infty }$ satisfy SCA given in Assumption 19.

1.
G, Q, and h are regular functions (due to the assumed convexity). Thus, $\widehat{\partial }G = \partial G$ exist; and likewise with $\widehat{\partial }Q=\partial Q = \nabla Q$, $\widehat{\partial }h = \partial h$.
2.
Calculus rules (e.g., [13, Exercise 8.8(c)]) imply that for any $({z},w)$
$$\begin{aligned} \partial G({z},w)&= \{ \nabla _{z}Q({z},w)\} \times \{ \nabla _wQ({z},w) + \partial _wh(w) \} \ne \emptyset , \end{aligned}$$
(22a)
$$\begin{aligned} \widehat{\partial }G({z},w)&= \{ \nabla _{z}Q({z},w)\} \times \{ \nabla _wQ({z},w) + \widehat{\partial }_wh(w) \} \ne \emptyset . \end{aligned}$$
(22b)

Lemma 21

Assume that G satisfies Assumption 19. If $\left( {z}^{*},w^{*}\right) \in {\mathbb {X}}\times {\mathbb {Y}}$ is a partial minimum of G as in Definition 5 (so that $\nabla _{{z}} G({z}^{*}, w^{*}) =0$ and $0 \in \widehat{\partial }_{w} G\left( {z}^{*},w^{*}\right) $), then we have the Fréchet stationarity $(0,0) \in \widehat{\partial }G\left( {z}^{*},w^{*}\right) $.

Proof

Follows as an application of [22, Theorem 4.1]. $\square $

5.1 On the stationarity of Gauss–Seidel limit points

Two views of framing the Gauss–Seidel step of Sect. 2.1 now apparent are: 1) via continuous block $z$ and block $w$ partial minimisation updates of the continuous “regularisation function” $\varphi ^{\lambda ,\rho ,\pi }$, and 2) continuous consensus block $z$ minimisation updates and mixed-integer block $(x,y,w)$ minimisation updates applied directly to augmented Lagrangian reformulations of SMIP (1). The former approach still requires an analysis of the $(x,y)$ update, but in a hidden form. On the other hand, the latter relies on the fact that the iterates will eventually fall into a region where the integer variable will become fixed in value, and thus subproblem optimisations are local and associated with continuous (and convex) parts of the problem.

Motivated by the use of Lagrangian- and penalty-based solution approaches, we furthermore assume that $G=G^{k}$ varies across iterations $k\ge 0$ subject to the following assumptions.

Assumption 22

Structural Assumptions (SA): Given $\{G^{k}: {\mathbb {X}}\times {\mathbb {Y}}\rightarrow \mathbb {R}_{+\infty }\}_{k=0}^\infty $, let Assumption 19 hold for each $G^k$, $k\ge 0$. We assume that $\{G^k\}_{k=0}^\infty $ epi-converges to $\overline{G}: {\mathbb {X}}\times {\mathbb {Y}}\rightarrow \mathbb {R}_{+\infty }$, $\{Q^k\}_{k=0}^\infty $ epi-converges to $\overline{Q}: {\mathbb {X}}\times {\mathbb {Y}}\rightarrow \mathbb {R}$ and that $\{h^k\}_{k=0}^\infty $ epi-converges to $\overline{h}: {\mathbb {X}}\times {\mathbb {Y}}\rightarrow \mathbb {R}_{+\infty }$.

In Sect. 5.2, we identify the sequences $\{G^k, ({z}^k,w^k)\}_{k=0}^\infty $ with a subsequence of GS (mid-)iterations associated with the application of Algorithm 1. For now, we deliberately detach the analysis of $\{G^k, ({z}^k,w^k)\}_{k=0}^\infty $ from its intended algorithmic identification. The convexity of $Q^k$ and $h^k$ in Assumptions 19 and 22 allows for $\{\partial G^k\}_{k=0}^\infty $ to converge in graph [13, Theorem 12.35]. This assumption will not prove restrictive in the integration of the present analysis with the convergence properties of Algorithm 1 even though the underlying problem has mixed-integer constraints.

Lemma 23

Let $\{G^{k}: {\mathbb {X}}\times {\mathbb {Y}}\rightarrow \mathbb {R}_{+\infty }\}_{k=0}^\infty $ epi-converge to $\overline{G}: {\mathbb {X}}\times {\mathbb {Y}}\rightarrow \mathbb {R}_{+\infty }$ satisfy Assumption 22. For $\{({z}^{k},w^{k})\}_{k=0}^\infty \rightarrow (\overline{z},\overline{w})$ we have $0 \notin \widehat{\partial }\overline{G}(\overline{z},\overline{w})$ if and only if $\liminf _{k\rightarrow \infty } \inf _{(\zeta ,\omega ) \in G^{k}({z}^{k},w^{k})} \Vert (\zeta ,\omega )\Vert =\gamma > 0$.

Proof

Given that $0 \not \in \partial \overline{G}(\bar{{z}},\bar{w})$, it follows from [13, Theorem 12.35(b)] that the sequence $\{\widehat{\partial }G^{k}({z}^{k},w^{k})\}_{k=0}^\infty $ must be strictly bounded away from zero in that $\liminf _{k\rightarrow \infty } \inf _{(\zeta ,\omega ) \in \partial G^{k}({z}^{k},w^{k})} \Vert (\zeta ,\omega )\Vert > 0$. $\square $

Assumption 24

Stationarity of $w$ ($w$-stat) on the (sub)sequence indexed by $k$: For each $k\ge 1$, $0 \in \widehat{\partial }_{w} G^{k}({z}^{k},w^{k})$.

Lemma 25

Assume $\{G^{k}({z}^{k},w^{k})\}_{k=0}^\infty \rightarrow (\overline{G},\overline{z},\overline{w})$ satisfies the SA Assumption 22. If $0 \notin \widehat{\partial }\overline{G}(\overline{z},\overline{w})$ and $w$-stat (Assumption 24 holds), then $\Vert \nabla _{{z}} \overline{Q}(\overline{z},\overline{w})\Vert \ne 0$ and $\liminf _{k\rightarrow \infty } \Vert \nabla _{z}Q^{k}({z}^{k},w^{k})\Vert = \gamma > 0$.

Proof

Under Assumption 24 ($w$-stat), we must have for the $w$ subgradient components $ 0 \in \{ \nabla _wQ^{k}(\bar{{z}},\bar{w}) + \widehat{\partial }_wh^{k}(\bar{w}) \} \ne \emptyset $ for $k\ge 0$, and so the hypothesis $0 \notin \widehat{\partial }\overline{G}(\overline{z},\overline{w})$ and the calculus rules of Lemma 20 and Lemma 23 imply the intended result. $\square $

For the following results, we introduce an Armijo descent step rule for the ${z}$ step to aid in the convergence analysis.

Assumption 26

${z}$-Descent Assumption (${z}$-DA): Given $\beta ,\sigma \in (0,1)$ and given subsequences $\{d^{k}\}_{k=0}^\infty $, $\{G^{k}\}_{k=0}^\infty $, and $\{({z}^{k},w^{k})\}_{k=0}^\infty $ such that $\nabla _{z}Q^{k}({z}^{k},w^{k}) d^{k} < 0$ for $k\ge 1$, ${z}$-DA is satisfied if

$$\begin{aligned} \lim _{k\rightarrow \infty } G^{k}({z}^{k} + \alpha ^{k}d^{k},w^{k}) - G^{k}({z}^{k},w^{k}) = 0 \end{aligned}$$

where $\alpha ^k$ is computed with Algorithm 2 given ${z}={z}^{k}$ and $w=w^{k}$, $k\ge 1$.

The ${z}$-DA Assumption 26 itself makes no assumption on how the sequence $\{({z}^{k},w^{k}), d^{k}\}_{k=0}^\infty $ is constructed. Subsequently stated identifications of the sequence $\{({z}^{k},w^{k})\}_{k=0}^\infty $ with subsequences generated by Algorithm 1 will guarantee the satisfaction of ${z}$-DA Assumption 26 under mild assumptions on the implementation of Algorithm 1. The Armijo Step of Algorithm 2 is not actually used in our implementation of Algorithm 1. Rather, it is merely a theoretical tool in what follows.

The proof of the following lemma is based on ideas from [24, Proposition 1.2.1], [25, Proposition 3.2], and [26, Technical Lemmas, Appendix A].

Lemma 27

Assume that 1) $\{G^{k}:{\mathbb {X}}\times {\mathbb {Y}}\rightarrow \mathbb {R}_{+\infty }\}_{k=0}^\infty $ epi-converging to $\overline{G}:{\mathbb {X}}\times {\mathbb {Y}}\rightarrow \mathbb {R}_{+\infty }$ satisfies SA (Assumption 22); 2) $\{(w^k,{z}^k)\}_{k=0}^\infty $ converges to $(\overline{w},\overline{z})$ and satisfies $w$-stat (Assumption 24); and 3) $\nabla _{z}Q^k({z}^k,w^k) \ne 0$ for each $k\ge 1$.

If, for some $\beta ,\sigma \in (0,1)$, the ${z}$-DA (Assumption 26) holds for $d^k= -\nabla _{z}Q^k({z}^k,w^k)$, then $0 \in \partial \overline{G}(\bar{{z}},\bar{w})$. Moreover, $\overline{G}$ is regular at $(\bar{{z}},\bar{w})$ and so we have also $0 \in \widehat{\partial }\overline{G}(\bar{{z}},\bar{w})$.

Proof

From (22), the satisfaction of $w$-stat Assumption 24, and Lemma 21, we only need to show that $\lim _{k \rightarrow \infty } \Vert \nabla _{z}Q^k({z}^k,w^k) \Vert = 0$. Due to SA Assumption 22 and ${z}$-DA Assumption 26, we have

$$\begin{aligned} 0 {=} \lim _{k\rightarrow \infty } G^{k}({z}^{k} {+} \alpha ^{k}d^{k},w^{k}) {-} G^{k}({z}^{k},w^{k})&= \lim _{k\rightarrow \infty } Q^{k}({z}^{k} + \alpha ^{k}d^{k},w^{k}) - Q^{k}({z}^{k},w^{k})\\&\le \lim _{k\rightarrow \infty } \alpha ^{k} \sigma \nabla _{{z}} Q^{k}({z}^{k},w^{k}) d^k\le 0. \end{aligned}$$

Thus, $ \lim _{k\rightarrow \infty } \alpha ^{k} \sigma \nabla _{{z}} Q^{k}({z}^{k},w^{k}) d^k= 0. $

We consider two cases: 1) $\limsup _{k\rightarrow \infty } \alpha ^{k} > 0$, and 2) $\limsup _{k\rightarrow \infty } \alpha ^{k} = 0$. Due to the assumed continuity of $\nabla _{{z}} Q^{k}$, and given that $d^k= -\nabla _{{z}} Q^{k}({z}^k,w^k)$, the first case implies that

$$\begin{aligned} \lim _{k\rightarrow \infty } \nabla _{{z}} Q^{k}({z}^{k},w^{k}) d^k= \lim _{k\rightarrow \infty } -\Vert \nabla _{{z}} Q^{k}({z}^{k},w^{k}) \Vert ^2 = 0. \end{aligned}$$

and so $ \lim _{k\rightarrow \infty } \nabla _{{z}} Q^{k}({z}^{k},w^{k}) = 0. $ Otherwise, assuming that $\limsup _{k \rightarrow \infty } \alpha ^{k} = \lim _{k\rightarrow \infty } \alpha ^{k} = 0$, we have for some large enough $k\ge \bar{k} \ge 0$ that $\alpha ^k\le \beta < 1$ and so we have $\bar{\alpha }^k=\alpha ^k/\beta \le 1$—that is, the state of $\alpha ^k$ at the penultimate iteration of Algorithm 2—for which it holds that

$$\begin{aligned} G^{k}({z}^{k} + \bar{\alpha }^kd^k,w^{k}) - G^{k}({z}^{k},w^{k}) > \bar{\alpha }^k\sigma \nabla _{{z}} Q^{k}({z}^k,w^k) d^k, \end{aligned}$$

which implies

$$\begin{aligned} \frac{Q^{k}({z}^{k} + \bar{\alpha }^{k}d^k,w^{k}) - Q^{k}({z}^{k},w^{k})}{\bar{\alpha }^{k}} > \sigma \nabla _{{z}} Q^{k}({z}^k,w^k) d^k. \end{aligned}$$

Applying the Mean Value Theorem at each $k$, we have

$$\begin{aligned} \nabla _{z}Q^{k}({z}^{k} + \tilde{\alpha }^kd^k,w^{k})d^k> \sigma \nabla _{{z}} Q^{k}({z}^k,w^k) d^k. \end{aligned}$$

for some $\tilde{\alpha }^k\in [0,\bar{\alpha }^k]$. Using the continuity of $\nabla _{{z}} Q^{k}$ (and the Cauchy-Schwartz inequality), we have for arbitrarily small $\epsilon > 0$ that there exists $\delta < 0$ where for large enough k, we have $\tilde{\alpha }^k\le \bar{\alpha }^k< \delta $ so that

$$\begin{aligned} \epsilon \Vert d^k\Vert + \nabla _{z}Q^{k}({z}^{k},w^{k}) d^k>\nabla _{z}Q^{k}({z}^{k} + \tilde{\alpha }^kd^k,w^{k})d^k> \sigma \nabla _{{z}} Q^{k}({z}^{k},w^{k}) d^{k} \end{aligned}$$

holds for sufficiently large $k$.

Recalling that $d^{k} = -\nabla _{z} Q^{k}({z}^{k},w^{k}) \ne 0$, we then have

$$\begin{aligned} \epsilon \Vert d^{k}\Vert > (\sigma -1) \nabla _{z} Q^{k}({z}^{k},w^{k}) d^{k} = (1-\sigma ) \Vert \nabla _{z}Q^{k}({z}^{k},w^{k}) \Vert _2^2 \end{aligned}$$

and so $ \epsilon > (1-\sigma ) \Vert \nabla _{z}Q^{k}({z}^{k},w^{k}) \Vert _2 $ holds for sufficiently large $k$. In the limit, we have $ 0 \ge (1-\sigma ) \Vert \nabla _{{z}} \overline{Q}(\bar{{z}},\bar{w}) \Vert _2, $ which is a contradiction since $(1-\sigma ) > 0$ and $\Vert \nabla _{{z}} \overline{Q}(\bar{{z}},\bar{w}) \Vert _2 > 0$ as established from Lemma 25 and the SA Assumption 22. Thus, $0 \in \partial \overline{G}(\bar{{z}},\bar{w})$, and since $\overline{G}$ is regular, then $0 \in \widehat{\partial }\overline{G}(\bar{{z}},\bar{w})$ holds also. $\square $

In order to apply Lemma 27 to a convergence analysis of Algorithm 1, we need to establish the satisfaction of the SA Assumption 22, $w$-stat Assumption 24, and especially the ${z}$-DA Assumption 26 requiring $ \lim _{k\rightarrow \infty } G^{k}({z}^{k} + \alpha ^{k}d^{k},w^{k}) - G^{k}({z}^{k},w^{k}) = 0 $ given an appropriate identification with Algorithm 1 subsequence iterations $\{n_k\}_{k=0}^\infty $.

5.2 Interleaving analysis and algorithm

We analyse subsequences $\{(x^{n_k+1}, y^{n_k+1}, w^{n_k+1}, z^{n_k})\}_{k=0}^\infty $ from the iterations generated by the application of Algorithm 1 applied to problem (1) that converge to $(\overline{x},\overline{y},\overline{w},\overline{z})$. (Such limit points with respect to the entire sequence in $n$ exist due to inf-compactness (1) that will be demonstrated in this subsection.) This analysis depends on establishing that the Sect. 5.1 assumptions hold under the appropriate identifications with Algorithm 1 (i.e., the SA Assumption 22, the $w$-stat Assumption 24, and the ${z}$-DA Assumption 26).

Given the assumed subsequence convergence, we may take the subsequence $\{(x^{n_k+1}, y^{n_k+1}, w^{n_k+1}, z^{n_k})\}_{k=0}^\infty $ so that integer component values for $x_{\mathcal {I}}=\overline{x}_{\mathcal {I}}$ and $y_{\mathcal {I}}=\overline{y}_{\mathcal {I}}$ are fixed. With respect to Algorithm 1, we apply the identifications given GS iterations indexed by $n\ge 1$ and subsequence iterations indexed with $n_k$, $k\ge 1$:

Assumption 28

Algorithm 1 Identifications:

1.
Variables: ${z}^k\leftarrow z^{n_k}$ and $w^k\leftarrow (x_s^{n_k+1},y_s^{n_k+1},w_s^{n_k+1})_{s \in S}$
2.
Expressions: $Q^k({z}^k,w^k) \leftarrow \sum _{s \in S} \pi _s^{n_k} \psi (z^{n_k}-x_s^{n_k+1},w_s^{n_k+1}-y_s^{n_k+1})$ and
$$\begin{aligned} h^k(w^k) \leftarrow \sum _{s \in S} \left[ \frac{1}{\rho ^{n_k}} \left( f_s\left( x_s^{n_k+1}, y_s^{n_k+1} \right) - (\lambda _s^{n_k})^\top x_s^{n_k+1} \right) + \delta _{K_s^{ (\overline{{\textbf{x}}}_{{\mathcal {I}}},\overline{{\textbf{y}}}_{{\mathcal {I}}})} } (x_s^{n_k+1},y_s^{n_k+1}) \right] \end{aligned}$$

Recalling $f^n(\varvec{x},\varvec{y},\varvec{w},z) = \sum _{s\in S} f_s(x_s,y_s) + (\lambda _s^n)^\top x_s + \rho ^n\pi _s^n\psi (z- x_s, w_s - y_s )$ defined in (17), define for each $n\ge 0$

$$\begin{aligned} g^{n}(\varvec{x},\varvec{y},\varvec{w},z) := \frac{1}{\rho ^{n}} f^n(\varvec{x},\varvec{y},\varvec{w},z). \end{aligned}$$

(23)

we have also

$$\begin{aligned} G^{k}({z}^k,w^k)&\leftarrow g^{n_k}(\varvec{x}^{n_k+1},\varvec{y}^{n_k+1},\varvec{w}^{n_k+1},z^{n_k}) +\sum _{s \in S} \delta _{K_s^{ (\overline{{\textbf{x}}}_{{\mathcal {I}}},\overline{{\textbf{y}}}_{{\mathcal {I}}})} } (x_s^{n_k+1},y_s^{n_k+1})\\&= g^{n_k}(\varvec{x}^{n_k+1},\varvec{y}^{n_k+1},\varvec{w}^{n_k+1},z^{n_k}) \end{aligned}$$

where the last equality is by construction (fixed integral values) of the subsequence.

Furthermore, to guarantee the set of assumptions: SA, $w$-Stat, and ${z}$-DA, we assume the following of the GS (sub)sequences:

Assumption 29

Algorithm Assumptions: In the application of Algorithm 1 to problem (1), the following hold:

1.
SMIP assumptions: Assumption 1 holds for problem (1).
2.
Penalty function assumptions: $\psi $ satisfies the Assumption 2. Furthermore, we subsequently note special implications that hold in the cases where $\psi $ takes the weighted squared 2-norm form with weights $\bar{\mu }_i > 0$ such that
$$\begin{aligned} \psi (z-x_s,0) = \frac{1}{2} \sum _{s \in S} \sum _{i=1}^n \bar{\mu }_i(z_i-x_{s,i})^2. \end{aligned}$$
(24)
3.
Global optimality: Each $(\varvec{x}^{n+1},\varvec{y}^{n+1},\varvec{w}^{n+1})$ is globally optimal given fixed $z^{n}$ in that
$$\begin{aligned} (\varvec{x}^{n+1},\varvec{y}^{n+1},\varvec{w}^{n+1}) \in \arg \min _{x,y,w} f^n(\varvec{x},\varvec{y},\varvec{w},z^n) \end{aligned}$$
(hence limit points $(\overline{\varvec{x}},\overline{\varvec{y}},\overline{\varvec{w}})$ are globally optimal given fixed limit point $\overline{z}$ by known results [27, Propositions 1.3.5 and 1.3.6]). Also, $z^{n+1} \in {{\,\textrm{argmin}\,}}_{z} \sum _{s \in S} \pi _s^n\psi (z-x_s^{n+1},0)$, $n\ge 1$, are globally optimal. Furthermore, under the additional assumption that $\psi $ is of the weighted squared 2-norm form (24), we have (independent of weights $\bar{\mu }_i > 0$, $i=1,\dots ,n$)
$$\begin{aligned} z^{n+1}&\leftarrow \sum _{s \in S} \pi _s^nx_s^{n+1} \in \arg \min _{z} \sum _{s \in S} \pi _s^n\psi (z-x_s^{n+1},0). \end{aligned}$$
4.
Generation of Lagrange multipliers: $\lambda _s^0=0$ for $s \in S$. If $\psi $ is not of the weighted squared 2-norm form (24), then we may assume $\lambda _s^n\equiv 0$, $n\ge 1$, identically. Otherwise, if $\psi $ is of the squared 2-norm form (24), For each subsequent iteration $n$, either $\lambda _s^n$ is left unchanged via $\lambda _s^{n+1} \leftarrow \lambda _s^{n}$ for all $s \in S$, or
$$\begin{aligned} \lambda _{s,i}^{n+1}&\leftarrow \lambda _{s,i}^{n} - \rho ^n\pi _s^n\bar{\mu }_i\left( z_i^{n+1} -x_{s,i}^{n+1} \right) \quad \text {for all}\; s \in S. \end{aligned}$$
Under these assumptions on $\lambda $, it follows that for $n\ge 0$, we have vanishing sums $\sum _{s \in S} \lambda _s^n= 0$. (i.e., dual feasibility maintained and the absence of the $\sum _{s \in S}\lambda _s^nz$ terms in the Lagrangian is thus justified.) Non-trivial $\lambda $ updates between iterations are suppressed as necessary to ensure in the limit that $\sum _{n=1}^\infty \Vert \lambda _s^n-\lambda _s^{n+1}\Vert < \infty $ hold. (In practice this will usually entail only a finite number of nontrivial updates.)
5.
Update of penalty parameters: We assume the following.
1. (a)
  Penalty coefficients are nondecreasing $\rho ^{n+1} \ge \rho ^{n} > 0$, $n\ge 0$.
2. (b)
  $ 0 < \pi _s^{n}\rho ^{n} \le \pi _s^{n+1}\rho ^{n+1}$, $s \in S$, $n\ge 0$. ($\pi _s^{n} \le \pi _s^{n+1}$ does not hold in general.)
3. (c)
  For each $n \ge 0$ and $s \in S$, we have $\pi _s^n> 0$ and $\sum _{s \in S} \pi _s^n= 1$, and $\{\pi _s^{n}\}_{n=0}^\infty $ converges to $\pi _s > 0$ for all $s \in S$ such that $\sum _{n=1}^{\infty } |\pi _s^{n+1}-\pi _s^{n} |< \infty $. Initially, $\pi _s^0 \leftarrow p_s$.

The algorithm does not necessarily adjust $\pi _s^n$ and $\rho ^n$ parameters separately. Instead, it may apply penalty updates in a scenario-specific manner to $\rho _s^n:=\pi _s^n\rho ^n$ with $\rho ^n = \sum _{s \in S} \rho _s^n$. Under the condition that $s \in S_n:= \{s \in S \mid z_{{\mathcal {I}}} \ne x_{s,{\mathcal {I}}} \}$, we place $\rho ^{n+1}_s = \gamma _n \rho ^n_s$ with $\gamma _n >1$.

Lemma 30

Under the Algorithmic Identifications 28 with $G^k= h^k+ Q^k$ and Assumption 29, we have

1.
$\{ G^k\}_{k=0}^\infty $ epi-converges to $\overline{G}= \overline{h}+ \overline{Q}$ and satisfies the SA Assumption 22.
2.
$0 \in \partial _wG^k({z}^k,w^k)$ and $0 \in \partial _w\overline{G}(\overline{z},\overline{w})$.

Proof

By regularity of $\overline{G}$ due to its convexity, the (limiting) Mordukhovich and Fréchet subdifferentials coincide.

We argue that we have epi-convergence of $\{ G_{k} \}_{k=0}^\infty $ to $\overline{G}$, whenever $\{ (\pi _s^{n_k})_{s \in S} \}_{k=0}^\infty $ converges to $\{\pi _s\}_{s \in S}$. As

$$\begin{aligned} h^k( w):= \frac{ 1}{\rho ^{n_k}} \left( \sum _{s \in S} p_s \{c^{\top }x_s + d^{\top }_s y_s \} - (\lambda _s^{n_k} )^{\top } x_s \right) + \delta _{K^{ (\overline{{\textbf{x}}}_{{\mathcal {I}}},\overline{{\textbf{y}}}_{{\mathcal {I}}})} } (x,y). \end{aligned}$$

we have $h^k$ convex and converging both monotonically point-wise and uniformly to $\delta _{K^{(\overline{x}_{\mathcal I},\overline{y}_{\mathcal I}) }}$. This is because we have uniform convergence to zero of $ \frac{ 1}{\rho ^{n_k}} \left( \sum _{s \in S} p_s \{c^{\top }x_s + d^{\top }_s y_s \} - (\lambda _s^{n_k} )^{\top } x_s \right) $ on the compact and convex polyhedral set $K^{(\overline{x}_{\mathcal I},\overline{y}_{\mathcal I})}$. Thus we have epi-convergence of $\{h^k\}_{k=0}^\infty $ to $\delta _{ K^{(\overline{x}_{\mathcal I},\overline{y}_{\mathcal I}) }}$. Whenever $\{ (\pi _s^{n_k})_{s \in S} \}_{k=0}^\infty $ converges to $\pi _s$, for $s \in S$ we then have a family of convex functions $\{Q^{k}:= \sum _{s \in S} \pi _s^{n_k} \psi \}_{k=0}^\infty $, uniformly converging on compact sets and hence also epi-convergent to $\overline{Q}= \sum _{s \in S} \pi _s \psi _s (\cdot , \cdot )$. Applying [27, Theorem 7.1.5] or [13, Theorem 7.46], we know that the sum of a uniformly convergent sequence and an epi-convergent sequence must epi-converge. Thus we may deduce that we have $\{ G^k:= h^k+ Q^k\}_{k=0}^\infty $ epi-converges to $\overline{G}= \overline{h}+ \overline{Q}$. Now we may apply [13, Theorem 12.35] to deduce that the convex subdifferentials $\{\partial _wG^k\}_{k=0}^\infty $ converge in graph to $ \partial _w\overline{G}$. As $w^k$ is a global minimiser for $w\mapsto G^k({z}^k,w)$, then we have (by definitions) that $0 \in \partial _wG^k({z}^k,w^k)$ and hence also over the subsequence indexed by $k$. Thus by graphical convergence $0 \in \partial _w\overline{G}(\overline{z},\overline{w})$. $\square $

Definition 6

Let intervening GS iterations between $n_k$ and $n_{k+1}$ be denoted $n_k+1,n_k+2,\dots ,n_{k+1}-2,n_{k+1}-1$, etc.

Lemma 31

If $(\varvec{x}^n,\varvec{y}^n,\varvec{w}^n,z^n)$, $n\ge 1$, are computed with the GS iterations of Algorithm 1, then for each fixed $n\ge 1$ and positive integer j, we have

$$\begin{aligned}&g^{n+j}(\varvec{x}^{n+j+1}, \varvec{y}^{n+j+1}, \varvec{w}^{n+j+1}, z^{n+j}) - g^{n}(\varvec{x}^{n+1}, \varvec{y}^{n+1}, \varvec{w}^{n+1}, z^{n}) \\&\qquad +\sum _{i=0}^{j-1} \left[ g^{n+i}(\varvec{x}^{n+i+1},\varvec{y}^{n+i+1},\varvec{w}^{n+i+1}, z^{n+i+1})\right. \\&\qquad \left. -g^{n+i+1}(\varvec{x}^{n+i+1},\varvec{y}^{n+i+1},\varvec{w}^{n+i+1}, z^{n+i+1}) \right] \\&\quad \le g^{n}(\varvec{x}^{n+1}, \varvec{y}^{n+1}, \varvec{w}^{n+1}, z^{n}+\alpha ^nd^n) - g^{n}(\varvec{x}^{n+1}, \varvec{y}^{n+1}, \varvec{w}^{n+1}, z^{n}) \\ \end{aligned}$$

where $z^{n+i+1} \in \arg \min _z\{ g^{n+i}(\varvec{x}^{n+i+1},\varvec{y}^{n+i+1}, \varvec{w}^{n+i+1}, z)$ } for $i=0,\dots ,j-1$ as is consistent with an iteration of Algorithm 1.

Proof

See Appendix A. $\square $

Corollary 32

Given $n_k$, $k\ge 0$, a subsequence index, and $j_k$ the positive integer such that $n_k+j_k= n_{k+1}$, if $(\varvec{x}^{n+1},\varvec{y}^{n+1},\varvec{w}^{n+1},z^n)$, $n\ge 1$, are computed with GS iterations, then

$$\begin{aligned} \sum _{i=0}^{j_{k}-1}&\left[ g^{n_k+i}(\varvec{x}^{n_k+i+1},\varvec{y}^{n_k+i+1},\varvec{w}^{n_k+i+1},z^{n_k+i+1}) \right. \\&\left. -g^{n_k+i+1}(\varvec{x}^{n_k+i+1},\varvec{y}^{n_k+i+1},\varvec{w}^{n_k+i+1},z^{n_k+i+1}) \right] \\&+G^{k+1}({z}^{k+1}, w^{k+1}) -G^{k}({z}^{k}, w^{k}) \le \;G^{k}({z}^k+\alpha ^kd^k,w^k)-G^{k}({z}^k, w^k)&\end{aligned}$$

Proof

By the construction of the subsequence, $\delta _{K_s^{ (\overline{{\textbf{x}}}_{s,{\mathcal {I}}},\overline{{\textbf{y}}}_{s,{\mathcal {I}}})} } (x_s^{n_k+1},y_s^{n_k+1}) = 0$ and so any potential discrepancy between $g^{n_k}(\varvec{x}^{n_k+1},\varvec{y}^{n_k+1},\varvec{w}^{n_k+1},z^{n_k})$ and $G^{k}({z}^k,w^k)$ is avoided. $\square $

Lemma 33

Under Assumption 28 and Assumption 29, we have

$$\begin{aligned} \lim _{k\rightarrow \infty }&\sum _{i=0}^{j_{k}-1} \left[ g^{n_k+i}(\varvec{x}^{n_k+i+1},\varvec{y}^{n_k+i+1},\varvec{w}^{n_k+i+1},z^{n_k+i+1}) \right. \\&\left. \quad -g^{n_k+i+1}(\varvec{x}^{n_k+i+1},\varvec{y}^{n_k+i+1},\varvec{w}^{n_k+i+1},z^{n_k+i+1}) \right] = 0. \end{aligned}$$

Proof

See Appendix A. $\square $

Corollary 34

Assume the Algorithm Identifications 28. If the Algorithm Assumptions 29 hold, then ${z}$-DA Assumption 26 holds under any allowable realisation of its assumptions on $\beta ,\sigma $, $d^k$, etc. (Thus, the intended ${z}$-DA condition will hold for any convergent subsequence $\{({z}^k,w^k)\}_{k=0}^\infty =\{(\varvec{x}^{n_k+1},\varvec{y}^{n_k+1},\varvec{w}^{n_k+1},z^{n_k})\}_{k=0}^{\infty }$ with $d^k = -\nabla _{{z}} Q^k({z}^{k},w^{k})$, and $\alpha ^k$ computed with the Armijo rule for any $\beta ,\sigma \in (0,1)$.)

Proof

Given that $G^{k}({z}^k+\alpha ^kd^k,w^k)-G^{k}({z}^k, w^k) \le 0$ already holds per the Armijo step, the satisfaction of ${z}$-DA Assumption 26 follows from Lemma 33 and Corollary 32 once it is noted that $\lim _{k\rightarrow \infty } G^{k+1}({z}^{k+1}, w^{k+1}) -G^{k}({z}^{k}, w^{k}) = 0 $ follows from the SA epi-convergence of Assumption 22 [13, Theorem 12.35]. $\square $

Definition 7

Under the epi-convergence of SA Assumption 22, we define the limiting regularisation

$$\begin{aligned} \phi ^{\infty }(z,\varvec{w}) := \lim _{\rho \rightarrow \infty } \frac{1}{\rho }\varphi ^{\lambda ,\rho ,\pi }(z,\varvec{w}) = \min _{x,y} \left\{ \sum _{s \in S} \pi _s \psi (z-x_s,w_s-y_s) \mid (x_s,y_s) \in K_s\right\} \end{aligned}$$

We now state one of our main results. Before doing so, we denote the following

Definition 8

To accommodate both possibilities $\lim _{n\rightarrow \infty } \rho ^n= \bar{\rho } < \infty $ or $\lim _{n\rightarrow \infty } \rho ^n= \infty $ disjunctively, we define

$$\begin{aligned} g^*(\varvec{x},\varvec{y},\varvec{w},z):= & {} \lim _{n\rightarrow \infty } g^{n}(\varvec{x},\varvec{y},\varvec{w},z)\nonumber \\= & {} \lim _{n\rightarrow \infty } \sum _{s \in S} \left[ \frac{1}{\rho ^{n}} \left( f_s(x_s, y_s ) - (\lambda _s^{n})^\top x_s \right) + \pi _s^{n} \psi _s(z-x_s,w_s-y_s) \right] \nonumber \\ \end{aligned}$$

(25)

(recalling the definition (23)). From (25), we define the limiting regularisation

$$\begin{aligned} \phi ^{*}(z,\varvec{w}):= & {} \lim _{n\rightarrow \infty } \frac{1}{\rho ^n} \varphi ^n(z,\varvec{w}) = \min _{x,y} g^*(\varvec{x},\varvec{y},\varvec{w},z)\\ \phi ^{*}\left( z,\varvec{w} \mid \overline{\varvec{x}}_{{\mathcal {I}}},\overline{\varvec{y}}_{{\mathcal {I}}}\right):= & {} \min _{x,y} g^*(\varvec{x},\varvec{y},\varvec{w},z)+ \delta _{K^{ (\overline{{\textbf{x}}}_{{\mathcal {I}}},\overline{{\textbf{y}}}_{{\mathcal {I}}})} } (\varvec{x},\varvec{y}) \end{aligned}$$

The corresponding set of solutions $(\varvec{x},\varvec{y})$ realising these values given $(z,\varvec{w})$ is denoted

$$\begin{aligned} \Phi ^{*}(z,\varvec{w}):= & {} \arg \min _{\varvec{x},\varvec{y}} g^*(\varvec{x},\varvec{y},\varvec{w},z)\\ \Phi ^{*}\left( z,\varvec{w} \mid \overline{\varvec{x}}_{{\mathcal {I}}},\overline{\varvec{y}}_{{\mathcal {I}}}\right):= & {} \arg \min _{\varvec{x},\varvec{y}} g^*(\varvec{x},\varvec{y},\varvec{w},z)+ \delta _{K^{ (\overline{{\textbf{x}}}_{{\mathcal {I}}},\overline{{\textbf{y}}}_{{\mathcal {I}}})} } (\varvec{x},\varvec{y}). \end{aligned}$$

Proposition 35

Let $(\overline{\varvec{x}},\overline{\varvec{y}},\overline{\varvec{w}},\overline{z})$ satisfy $(\overline{\varvec{x}},\overline{\varvec{y}}) \in K$. The following implications hold.

1.
If the Fréchet stationarity $0 \in \widehat{\partial }g^*(\overline{\varvec{x}},\overline{\varvec{y}},\overline{\varvec{w}},\overline{z})$ holds, then $(0,0) \in \widehat{\partial }\phi ^{*}\left( \overline{z},\overline{\varvec{w}} \mid \overline{\varvec{x}}_{\mathcal {I}},\overline{\varvec{y}}_{\mathcal {I}}\right) $ so that $(\overline{z},\overline{\varvec{w}})$ is a minimum of $\phi ^{*}\left( \overline{z},\overline{\varvec{w}} \mid \overline{\varvec{x}}_{\mathcal {I}},\overline{\varvec{y}}_{\mathcal {I}}\right) $.
2.
$(\overline{z},\overline{\varvec{w}})$ is a local minimum of $\phi ^{*}$ with $(0,0) \in \widehat{\partial }\phi ^{*}(\overline{z},\overline{\varvec{w}}) =\nabla \phi ^{*}(\overline{z},\overline{\varvec{w}})$ if and only if $(\overline{z},\overline{\varvec{w}})$ is a local minimum of $\phi ^{*}\left( \cdot ,\cdot \mid \varvec{x}_{\mathcal {I}},\varvec{y}_{\mathcal {I}}\right) $ with $(0,0) \in \widehat{\partial }\phi ^{*}\left( \overline{z},\overline{\varvec{w}} \mid \varvec{x}_{\mathcal {I}},\varvec{y}_{\mathcal {I}}\right) = \partial \phi ^{*}\left( \overline{z},\overline{\varvec{w}} \mid \varvec{x}_{\mathcal {I}},\varvec{y}_{\mathcal {I}}\right) $ for all $(\varvec{x}_{\mathcal {I}},\varvec{y}_{\mathcal {I}}) \in {{\,\textrm{proj}\,}}_{{\mathcal {I}}}(\Phi ^{*}(\overline{z},\overline{\varvec{w}}))$.
3.
If $\overline{z}= \overline{x}_s$ for all $s \in S$, then $(0,0) \in \widehat{\partial }\phi ^{*}(\overline{z},\overline{\varvec{w}})$ and so $(0,0) = \nabla \phi ^{*}(\overline{z},\overline{\varvec{w}})$. Furthermore, $(\overline{z},\overline{\varvec{w}})$ is also a (persistent) local minimum for $\phi ^{*}$.
4.
In the specific case where $\lim _{n\rightarrow \infty } \rho ^n= \infty $ (so that $\phi ^{*}=\phi ^{\infty }$), the reverse of the previous implication also holds, where $(\overline{z},\overline{\varvec{w}})$ being a local minimum for $\phi ^{*}$ with $(0,0) \in \widehat{\partial }\phi ^{*}(\overline{z},\overline{\varvec{w}})$ implies that $\overline{z}= \overline{x}_s$, $s \in S$.

Proof

Part 1: Given $0 \in \widehat{\partial }g^*(\overline{\varvec{x}},\overline{\varvec{y}},\overline{\varvec{w}},\overline{z})$ and the structure of $g^*$ as the sum of a linear function and an indicator function for a polyhedral set with integer cross-sections, we have that $g^*(\varvec{x},\varvec{y},\varvec{w},z) \ge g^*(\overline{\varvec{x}},\overline{\varvec{y}},\overline{\varvec{w}},\overline{z})$ for all $(\varvec{x},\varvec{y}) \in K^{(\overline{\varvec{x}}_{\mathcal {I}},\overline{\varvec{y}}_{\mathcal {I}})}$ and $(z,\varvec{w}) \in {\mathbb {X}}\times {\mathbb {Y}}$ and more particularly, $g^*(\varvec{x},\varvec{y},\overline{\varvec{w}},\overline{z}) \ge g^*(\overline{\varvec{x}},\overline{\varvec{y}},\overline{\varvec{w}},\overline{z})$ for all $(\varvec{x},\varvec{y}) \in K^{(\overline{\varvec{x}}_{\mathcal {I}},\overline{\varvec{y}}_{\mathcal {I}})}$ Thus,

$$\begin{aligned} g^*(\overline{\varvec{x}},\overline{\varvec{y}},\overline{\varvec{w}},\overline{z}) = \phi ^{*}\left( \overline{z},\overline{\varvec{w}} \mid \overline{\varvec{x}}_{\mathcal {I}},\overline{\varvec{y}}_{\mathcal {I}}\right) . \end{aligned}$$

Furthermore, since $g^*(\varvec{x},\varvec{y},\varvec{w},z) \ge g^*(\overline{\varvec{x}},\overline{\varvec{y}},\overline{\varvec{w}},\overline{z})$ for all $(\varvec{x},\varvec{y}) \in K^{(\overline{\varvec{x}}_{\mathcal {I}},\overline{\varvec{y}}_{\mathcal {I}})}$ and $(z,\varvec{w}) \in {\mathbb {X}}\times {\mathbb {Y}}$, we have

$$\begin{aligned} \phi ^{*}\left( z,\varvec{w} \mid \overline{\varvec{x}}_{\mathcal {I}},\overline{\varvec{y}}_{\mathcal {I}}\right) \ge \phi ^{*}\left( \overline{z},\overline{\varvec{w}} \mid \overline{\varvec{x}}_{\mathcal {I}},\overline{\varvec{y}}_{\mathcal {I}}\right) \end{aligned}$$

and so by the convexity of $\phi ^{*}\left( \cdot ,\cdot \mid \overline{\varvec{x}}_{\mathcal {I}},\overline{\varvec{y}}_{\mathcal {I}}\right) $, we have also that $(0,0) \in \widehat{\partial }\phi ^{*}\left( \overline{z},\overline{\varvec{w}} \mid \overline{\varvec{x}}_{\mathcal {I}},\overline{\varvec{y}}_{\mathcal {I}}\right) $.

Part 2: We have $(0,0) \in \widehat{\partial }\varphi ^{*} (\overline{z},\overline{\varvec{w}}\mid \varvec{x}_{\mathcal {I}},\varvec{y}_{\mathcal {I}})$ (in both the Fréchet and classical sense) due to $(\overline{z},\overline{\varvec{w}}) \in \arg \min _{z,\varvec{w}} \phi ^{*}(z,\varvec{w}\mid \varvec{x}_{\mathcal {I}},\varvec{y}_{\mathcal {I}}) $ and as $\phi ^{*}(z,\varvec{w}) = \min _{(\varvec{x}_{\mathcal {I}},\varvec{y}_{\mathcal {I}}) \in \text {proj}_{{\mathcal {I}}}(K)} \phi ^{*}({z,\varvec{w}}\mid {\varvec{x}_{\mathcal {I}},\varvec{y}_{\mathcal {I}}})$ where each $(z,\varvec{w}) \mapsto \phi ^{*}({z,\varvec{w}}\mid {\varvec{x}_{\mathcal {I}},\varvec{y}_{\mathcal {I}}})$ is convex, we may invoke Lemma 11 Part 2 to obtain both directions of the implication after identifying $\varphi $ with $\phi ^{*}$ and $\varphi _i$, $i \in I$, with $\phi ^{*}\left( \cdot ,\cdot \mid \varvec{x}_{\mathcal {I}},\varvec{y}_{\mathcal {I}}\right) $ for $(\varvec{x}_{\mathcal {I}},\varvec{y}_{\mathcal {I}}) \in {{\,\textrm{proj}\,}}_{\mathcal {I}}(\Phi ^{*}(z,\varvec{w}))$.

Part 3: The fact that $\overline{z}= \overline{x}_s$ for all $s \in S$ implies by Corollary 8 that $\phi ^{*}(\overline{z},\overline{\varvec{w}}) =\phi ^{*}\left( \overline{z},\overline{\varvec{w}} \mid \varvec{x}_{\mathcal {I}},\varvec{y}_{\mathcal {I}}\right) $ for just one $(\varvec{x}_{\mathcal {I}},\varvec{y}_{\mathcal {I}}) =(\overline{\varvec{z}}_{\mathcal {I}},\overline{\varvec{w}}_{\mathcal {I}})$ only, and so the claim follows. The persistency follows from the fact that inequality (11) implies a bound independent of $\rho $.

Part 4: Knowing that $(\overline{z},\overline{\varvec{w}})$ is a local minimum for $\phi ^{\infty }$, we form the cleared instance of the SMIP (1) by clearing first- and second-stage coefficients $c=d=0$, and for all $n\ge 0$, clearing $\lambda ^n=0$ and setting $\pi ^n= \pi $. Thus, for all $\rho > 0$, we have that $\frac{1}{\rho }\varphi ^{\lambda ,\rho ,\pi }\equiv \phi ^{\infty }$ and so $(\widetilde{z}^n,\widetilde{\varvec{w}}^n) \equiv (\overline{z},\overline{\varvec{w}})$, $n\ge 0$, forms a sequence with limit $(\overline{z},\overline{\varvec{w}})\equiv (\widetilde{z}^n,\widetilde{\varvec{w}}^n)$, $n\ge 0$. Each $(\widetilde{z}^n,\widetilde{\varvec{w}}^n)\equiv (\overline{z},\overline{\varvec{w}})$ is a local minimum for $\varphi ^n$ over a fixed neighbourhood $B_\delta (\overline{z},\overline{\varvec{w}})$ for some fixed $\delta = \bar{\delta } > 0$, and since the SSA Assumption 9 and the PWA Assumption 10 therefore hold, by Proposition 17 applied to this sequence associated with this cleared instance of SMIP (1), we have $\overline{z}\in X$ and $\overline{\varvec{w}}\in Y(\overline{z})$, which applies with respect to the original (non-cleared) instance of SMIP (1) also. $\square $

Theorem 36

Assume that problem (1) satisfies the SMIP Assumption 1, to which Algorithm 1 is applied to generate a sequence $\{(\varvec{x}^n,\varvec{y}^n,\varvec{w}^n,z^n)\}_{n=0}^\infty $. If the Algorithm Assumption 29 is satisfied, then there exists a limit point $(\overline{\varvec{x}},\overline{\varvec{y}},\overline{\varvec{w}},\overline{z})$ of the mid-iteration sequence $\{(\varvec{x}^{n+1},\varvec{y}^{n+1},\varvec{w}^{n+1}, z^{n}) \}_{n=0}^\infty $, and each such limit point $(\overline{\varvec{x}},\overline{\varvec{y}},\overline{\varvec{w}},\overline{z})$ is a Fréchet stationary point for the problem

$$\begin{aligned} \min _{z,\varvec{x},\varvec{y},\varvec{w}} g^*(\varvec{x},\varvec{y},\varvec{w},z) \end{aligned}$$

(26)

and in either limiting case, the cross-sectional optimality $(\overline{z},\overline{\varvec{w}}) \in \arg \min _{z,\varvec{w}} \phi ^{*}\left( z,\varvec{w} \mid \overline{\varvec{x}}_{\mathcal {I}},\overline{\varvec{y}}_{\mathcal {I}}\right) $ holds. Thus, the following implications hold:

1.
$(\overline{z},\overline{\varvec{w}})$ is a local minimum of $\phi ^{*}$ if and only if $(\overline{z},\overline{\varvec{w}})$ is a local minimum of $\phi ^{*}\left( \cdot ,\cdot \mid \varvec{x}_{\mathcal {I}},\varvec{y}_{\mathcal {I}}\right) $ for all $(\varvec{x}_{\mathcal {I}},\varvec{y}_{\mathcal {I}}) \in {{\,\textrm{proj}\,}}_{\mathcal {I}}(\Phi ^{*}(\overline{z},\overline{\varvec{w}}))$.
2.
If $\overline{z}= \overline{x}_s$ for all $s \in S$ (so that $(\overline{\varvec{x}},\overline{\varvec{y}})$ is feasible and locally optimal for SMIP (1)), then $(\overline{z},\overline{\varvec{w}})$ is a (persistent) local minimum of $\phi ^{*}$.
3.
In the specific case where $\lim _{n\rightarrow \infty } \rho ^n= \infty $ (so that $\phi ^{*}=\phi ^{\infty }$), the reverse of the previous implication also holds, where $(\overline{z},\overline{\varvec{w}})$ being a local minimum of $\phi ^{*}$ implies that $\overline{z}= \overline{x}_s$, $s \in S$, so that $(\overline{\varvec{x}},\overline{\varvec{y}})$ is feasible and locally optimal for SMIP (1).

Proof

Under the SMIP Assumption that $K_s$, $s \in S$, are compact and penalty $\psi $ satisfies Assumptions 2, it follows that the level sets of g are compact, and so the sequence $\{(\varvec{x}^{n+1},\varvec{y}^{n+1},\varvec{w}^{n+1},z^n)\}_{n=0}^\infty $ will have limit points $(\overline{\varvec{x}},\overline{\varvec{y}},\overline{\varvec{w}},\overline{z})$ to which an associated subsequence $\{(\varvec{x}^{n_k+1},\varvec{y}^{n_k+1},\varvec{w}^{n_k+1},z^{n_k})\}_{k=0}^\infty $ converges. For $k$ large enough, we have $(\varvec{x}_{{\mathcal {I}}}^{n_k+1},\varvec{y}_{{\mathcal {I}}}^{n_k+1},\varvec{w}_{{\mathcal {I}}}^{n_k+1}) = (\overline{\varvec{x}}_{{\mathcal {I}}},\overline{\varvec{y}}_{{\mathcal {I}}},\overline{\varvec{w}}_{{\mathcal {I}}})$ becoming fixed. Therefore, it is only the $z^{n_k}$ and the real-valued components $(\varvec{x}_{{\mathcal {R}}}^{n_k+1},\varvec{y}_{{\mathcal {R}}}^{n_k+1},\varvec{w}_{\mathcal {R}}^{n_k+1} )$ that are still changing throughout the (sub)sequence tail. After passing to a convergent subsequence with integer components fixed, the required SA Assumption 22 and the $w$-stat Assumption 24 applies to Assumption 28 identified with $\{(G^k, {z}^k,w^k)\}_{k=0}^\infty $ by Lemma 30. Under the same assumptions, the ${z}$-DA Assumption 26 is satisfied due to Corollary 34. Thus, Lemma 27 may be applied to the Assumption 28 identification sequence $\{(G^k, {z}^k,w^k)\}_{k=0}^\infty $ to establish the stationarity properties which, after dereferencing the identifications back to the Algorithm 1 context, yield intended results.

The cross-sectional optimality $(\overline{z},\overline{\varvec{w}}) \in \arg \min _{z,\varvec{w}} \phi ^{*}\left( z,\varvec{w} \mid \overline{\varvec{x}}_{\mathcal {I}},\overline{\varvec{y}}_{\mathcal {I}}\right) $ holds by Proposition 35 Part 1. The proof of the three implications follows, respectively, from implications 2–4 of Proposition 35. $\square $

From Theorem 36 we know that the GS limit points $(\overline{z},\overline{\varvec{w}})$ will be optimal for at least one cross-section $\varphi ^{\lambda ,\rho ,\pi }\left( \cdot ,\cdot \mid \varvec{x}_{\mathcal {I}},\varvec{y}_{\mathcal {I}}\right) $ or $\phi ^{\infty }\left( \cdot ,\cdot \mid \varvec{x}_{\mathcal {I}},\varvec{y}_{\mathcal {I}}\right) $.

A simple example demonstrates the possibility that the above GS procedure produces a limit point $(\overline{z},\overline{\varvec{w}})$ for which $\widehat{\partial }\phi ^{\infty }(\overline{z},\overline{\varvec{w}})$ is empty.

Example 2

We revisit a rescaled version of the augmented Lagrangian problem of (20) defined for Example 1, where the objective function is rescaled by a factor of $\frac{1}{\rho }$.

	Cross-Section	Gradient w.r.t $ z$	Value	Loc. opt. over $ B_\delta (\overline{z},\overline{\varvec{w}})$		$ \Phi ^{*}$
$ \varvec{x}_{\mathcal {I}}$	$ \frac{1}{\rho ^n}\inf _w\varphi ^{n}\left( z,\varvec{w} \mid \varvec{x}_{\mathcal {I}}\right) $	$ \widehat{\partial }_z\varphi ^n(z,\varvec{w})$	$ \frac{1}{\rho }\varphi ^n(\widetilde{z}^n,\widetilde{\varvec{w}}^n)$	$ (\widetilde{z}^n,\widetilde{\varvec{w}}^n)$	$ \delta $	$ (z^,\varvec{w}^)$
$ \left[ \begin{array}{c} 0 \\ 0 \end{array}\right] $	$\frac{1}{\rho ^n}+\Vert z\Vert ^2$	$ 2z$	$\frac{1}{\rho ^n}$	$ \left( 0,\left[ \begin{array}{c} 0 \\ 1 \end{array}\right] \right) $	$\frac{1}{2}-\frac{1}{\rho ^n}$	$ \{0\} \times \mathbb {R}^2$
$ \left[ \begin{array}{c} 1 \\ 0 \end{array}\right] $	$ \frac{2}{\rho ^n}+\frac{1}{2}\left( \Vert z\Vert ^2 + \Vert z-1\Vert ^2\right) $	$ 2z-1$	$\frac{2}{\rho ^n}+\frac{1}{4}$	$ \left( \frac{1}{2},\left[ \begin{array}{c} 1 \\ 1 \end{array}\right] \right) $	0	$ \left\{ \frac{1}{2}\right\} \times \mathbb {R}^2$
$ \left[ \begin{array}{c} 0 \\ 1 \end{array}\right] $	$ \frac{1}{2}\left( \Vert z\Vert ^2 + \Vert z-1\Vert ^2\right) $	$ 2z-1$	$\frac{1}{4}$	$ \left( \frac{1}{2},\left[ \begin{array}{c} 0 \\ 0 \end{array}\right] \right) $	$ \frac{1}{\rho ^n}$	${\left\{ \frac{1}{2}\right\} \times \mathbb {R}^2}$
$ \left[ \begin{array}{c} 1 \\ 1 \end{array}\right] $	$ \frac{1}{\rho ^n}+\Vert z-1\Vert ^2$	$ 2z-2$	$ \frac{1}{\rho ^n}$	$ \left( 1,\left[ \begin{array}{c} 1 \\ 0 \end{array}\right] \right) $	$ \frac{1}{2}-\frac{1}{\rho ^n}$	$ {\left\{ 1\right\} \times \mathbb {R}^2}$

Of note is the locally optimal solution $(\widetilde{z}^n,\widetilde{\varvec{w}}^n) = \left( \frac{1}{2}, [0,0]^T\right) $, which for $0< \rho ^n< \infty $ is clearly a local minimum for $\varphi ^n$ over $|\overline{z}-z|< 1/\rho ^n$. Furthermore for $\rho ^n< \infty $, $\widehat{\partial }\varphi ^n(\widetilde{z}^n,\widetilde{\varvec{w}}^n) =\left\{ \left( 0,[0,0]^T \right) \right\} $ is non-empty. However, in the limit as $\rho \rightarrow \infty $, we have clearly that $(\overline{z},\overline{\varvec{w}}) = \left( \frac{1}{2},[0,0]^T\right) $ realises the value $\phi ^{\infty }(\overline{z},\overline{\varvec{w}}) = \frac{1}{4}$ over all cross-sections, but $\widehat{\partial }_z\phi ^{\infty }\left( \overline{z},\overline{\varvec{w}} \mid \varvec{x}_{\mathcal {I}},\varvec{y}_{\mathcal {I}}\right) \not \supset \{0\}$ for two of the four cross sections, and furthermore their intersection $\bigcap _{(\varvec{x}_{\mathcal {I}},\varvec{y}_{\mathcal {I}})} \widehat{\partial }_z\phi ^{\infty }\left( \overline{z},\overline{\varvec{w}} \mid \varvec{x}_{\mathcal {I}},\varvec{y}_{\mathcal {I}}\right) =\emptyset $ is empty and so $\widehat{\partial }\phi ^{\infty }(\overline{z},\overline{\varvec{w}}) = \emptyset $. Also note that while this $(\overline{z},\overline{\varvec{w}})$ is a partial minimum for $\phi ^{\infty }$, it is not (even) a local minimum over $(z,\varvec{w})$ jointly.

We have demonstrated through example (but never observed in our experiments) a pathological case where a partial minimum is encountered in the limit, but local optimality or feasibility for SMIP is not achieved. This lack of local optimality is due to a partial minimum being found for $\phi ^{*}$ where Fréchet subdifferentiability fails, a problem foreshadowed in Lemma 21 (indeed the existence of a non-empty subdifferential ensures stationarity from which local optimality follows, Lemma 11). This failure of subdifferentiability occurs only when our solution is minimising some (not all) of active sections defining $\phi ^{*}$, see Lemma 11. Furthermore, for any solution to problem (26) that satisfies consensus, this lack of Fréchet differentiability is ruled out, see Theorem 36 (2) and we are then assured of obtaining a persistent local minimum. A partial converse of this may be found in Proposition 12 where integer consensus is ensured for a persistent minimum. Such pathological limit points are unstable in the sense that they are mere partial minima but not even locally minimal jointly in $(z,\varvec{w})$ for $\phi ^{*}$. Consequently, an apt minor perturbation of $(\overline{z},\overline{\varvec{w}})$ (suggested by Corollary 18) may be employed to get the iterative FPPH approach unstuck.

6 Computational results

6.1 Algorithm

Algorithm 3 presents a modified version of Algorithm 1, in which we explicitly consider the initialisation steps and the rules for updating Lagrange and penalty parameters between successive iterations. We use an algorithm formulation consistent with the typical presentation of Progressive Hedging but allowing for differences in how (and when) Lagrange and penalty parameters are updated. Also, since the second-stage discrepancies are always zero in the context of a two-stage SMIP, we omit the second component of the penalty function (the $v$ component in Assumption 2).

The required properties of the penalty function specified in the ICRF Assumption 2 gives us flexibility in choosing $\psi $. As described next, we compute weights for a weighted squared 2-norm form of the penalty function $\psi $ during the initialisation with the aim of accelerating convergence to a reasonably high-quality feasible solution. Subsequently, we describe update schemes and the termination condition used in our computational experiments.

6.1.1 Initialisation

The penalty function weights in determining the weighted squared 2-norm penalty function $\psi $ are denoted $\bar{\mu }_i$, $i=1,\dots ,n$, which do not change between iterations $n\ge 0$. The iteration $n\ge 0$ penalty coefficients $\rho _s^n=\pi _s^n\rho ^n$ denote the weighting of the iteration $n$ penalty magnitude $\rho ^n$ by penalty weight $\pi _s^n$. Initially, $\rho _s^0 = \pi _s^0 = p_s$. Once the PenaltyUpdateCondition is satisfied, the $\rho _s$ terms are increased by the PenaltyUpdate function to modify the penalty applied to each scenario. These functions are defined in Sects. 6.1.2 and 6.1.3. The initial $z^0$ of Algorithm 3 Line 3 may be computed by $z^{0}_i =\sum _{s \in S} p_s x^{0}_{s,i}\text { for all }i \in 1,\dots ,n.$ Having $z^0$, the values $\mu _i$ for $i=1,\dots ,n$ which are required to form the penalty weights for initialising the penalty function $\psi $ are computed as given in [11] for applying Progressive Hedging to SMIPs:

$$\begin{aligned} \mu _{i} = \frac{c_i}{\max \left\{ \sum _{s' \in S} p_{s'} \left|x^0_{{s'},i} - z^0_{i} \right|, 1\right\} } \quad \text {for each }i \in 1,\dots ,n \end{aligned}$$

when variable i is continuous, and

$$\begin{aligned} \mu _{i} = \frac{c_i}{\left( \max _{s' \in S} x^0_{{s'},i}\right) - \left( \min _{s' \in S} x^0_{{s'},i}\right) + 1} \quad \text {for each }i \in 1,\dots ,n \end{aligned}$$

when variable i is discrete.

We employ a slightly modified version of this scheme, where penalty parameters that would be set to zero by this rule are instead set to the smallest non-zero value $\mu '$ among all other penalty parameters $\mu _i$, $i=1,\dots ,n$. We denote the modified penalty function weights as $\bar{\mu }_i:= \max \{\mu _i,\mu '\}$ for each $i=1,\dots ,n$. This modification did not materially affect the performance of Progressive Hedging and provides a guarantee that in the penalty-updating algorithms, all penalties can grow to be arbitrarily large, as required by Assumption 10.

This choice of penalty initialisation has been made to allow as direct a comparison as possible between Progressive Hedging (using a set of parameters established to be reasonable for that algorithm) and the penalty-updating variations of Algorithm 3. Having computed $\bar{\mu }_i$ for all $i=1,\dots ,n$, we set the penalty function for our computational experiments as $\psi (u) = \frac{1}{2}\sum _{i=1}^n \bar{\mu }_iu_i^2$ and initialise $\rho _s^0 = \pi _s^0 = p_s$ for all $s \in S$. (Note that $\rho _s^n:= \pi _s^n\rho ^n$ for $n\ge 0$.) With the definition of $\psi $, the z update step on Line 14 of Algorithm 3 can be written in the form

$$\begin{aligned} z^{n}_i \leftarrow \sum _{s \in S} \pi _s^{n-1} x^{n}_{s,i} \quad \text {for all }i \in 1,\dots ,n, \end{aligned}$$

where $\pi _s^{n} = \frac{\rho _s^n}{\sum _{s' \in S} \rho _{s'}^n}$ for each $s \in S$, $n\ge 0$. (Note that $\bar{\mu }_i$ does not influence this update step.) Furthermore, the dual multiplier update based on this definition of $\psi $ is given at each iteration $n\ge 0$ as

$$\begin{aligned} \lambda _{s,i}^{n} \leftarrow \lambda _{s,i}^{n-1} - \rho _s^{n-1}\bar{\mu }_i(z_i^n-x_{s,i}^n) \quad \text {for all}\; s \in S\;\text {and}\;i=1,\dots ,n. \end{aligned}$$

One may verify that this dual multiplier update maintains the feasibility condition $\sum _{s \in S} \lambda _s^n= 0$ for each $n\ge 0$.

6.1.2 Penalty update condition

We consider three update-type conditions:

1.
PenaltyUpdateCondition always returns False, meaning that the algorithm performs dual updates at every iteration and never increases the penalty parameters. This is equivalent to the Progressive Hedging algorithm for SMIP. This update condition does not satisfy Assumption 10, since the penalty parameters do not become arbitrarily large.
2.
PenaltyUpdateCondition always returns True, meaning that the algorithm does not update the dual variables again after the initialisation step and instead increases the penalty parameters. This is designated as the Penalty Only variant of FPPH.
3.
Track the degree of change in the dual variables
$$\begin{aligned} \Delta _k = \sum _{s \in S} \sum _{i=1}^{n} |\lambda _{s,i}^{k-1} - \lambda _{s,i}^k|\end{aligned}$$
at each iteration k. If in the current iteration $k'$ the condition
$$\begin{aligned} \Delta _{k'} < \beta \frac{\Delta _1 + \max _k \Delta _{k}}{2} - \gamma \end{aligned}$$
(27)
is satisfied, PenaltyUpdateCondition returns True so that no further dual updates are performed; otherwise it returns False. We set the parameters $\beta $ and $\gamma $ to 0.5 and $10^{-3}$ respectively. This is designated as the Dual Step Length variant of FPPH.

As a simple guarantee that the Dual Step Length method satisfies Assumption 10 we could specify a specific number of iterations after which PenaltyUpdateCondition must return True. However, in our computational tests with this update condition either (27) or integer-variable consensus was always satisfied after a reasonably small number of iterations.

6.1.3 Penalty update scheme

We gradually increase the penalty parameter for the scenario whose first-stage variables are furthest from consensus with the following method. For each scenario $s\in S$, we calculate its distance from consensus $D_s^n= \left\| z^n- x_s^n \right\| _2$. Then, update the penalty multipliers as follows:

$$\begin{aligned} \rho _{s}^{n} \leftarrow \left( 1 + \alpha |S |\frac{D_s^n}{\sum _{s \in S} D_s} \right) \rho _{s}^{n-1} \quad \text {for all }s\in S. \end{aligned}$$

We set the parameter $\alpha $ to 0.1. This rule is intended to prioritise increasing the penalty parameters corresponding to the scenarios whose first-stage variables are furthest from consensus. Assuming that PenaltyUpdateCondition returns True after a finite number of iterations, this update scheme satisfies Assumption 10.

6.1.4 Termination condition and $n_{max}$

Termination of each computational test is conditioned on attaining consensus $z_{\mathcal {I}}^n= x_{s,{\mathcal {I}}}$, for all $s \in S$ in all integer variables. For the instances with pure integer first-stage variables, this condition is the same as requiring first-stage consensus. For the instances with mixed integer first-stage variables, we generally do not have full consensus in the continuous variables at this point. To obtain feasible solutions, we take each unique first-stage solution $x_s$ and find the corresponding optimal second-stage decisions y. We then report the best solution value found among these candidate solutions.

Our motivation for applying this convergence criterion to mixed integer first-stage instances is that when allowed to run beyond achieving integer consensus, the FPPH variants typically satisfied the convergence criterion $\sqrt{\sum _{s \in S} p_s \left\| x^n_s - z^n \right\| ^2_2} < 10^{-3}$ within 100 iterations but with very poor solution quality, whereas PH failed to satisfy this criterion given even 200 iterations. Any potential method for finding a high-quality solution quickly given a fixed value for the first-stage integer variables could be applied to the solutions produced by both PH and FPPH, but implementing and tuning such a method is outside the scope of this paper.

We set $n_{max} = 100$ since both FPPH variants generally converge well within this iteration limit, and in cases where PH does not it is already clearly slower than FPPH in terms of both runtime and iteration count.

6.2 Computation environment

The experiments in this section were conducted with a C++ implementation of Algorithm 3 using CPLEX 22.1 [28] as the solver. For reading SMPS files into scenario-specific subproblems and for their interface with CPLEX, we used modified versions of the COIN-OR [29] Smi and Osi libraries to instantiate appropriate C++ class instances of the subproblems directly.

The computing environment is the Gadi cluster maintained by Australia’s National Computing Infrastructure (NCI) and supported by the Australian government [30]. To maintain a comparable environment, experiments were performed on a single CPU using one thread per CPLEX solve for both algorithms.

The PH and FPPH algorithms are deterministic in terms of the solutions produced, but the time required for CPLEX to solve the subproblems at each iteration has some variation. Therefore, for each test, we ran each algorithm three times on each instance and report the average runtime.

6.3 Computational experiments: Pure integer first-stage instances

We first consider the application of FPPH and our implementation of Progressive Hedging to the CAP instance set [31] using the first 250 scenarios for each instance, and the SSLP instance set [32]. To evaluate algorithm performance we compare it to the known IP optimal solution. To obtain the integer feasible optimal solutions for the CAP instances we used CPLEX to directly solve the MIP reformulation of each instance. The integer feasible optimal solutions for the SSLP instances are provided by SIPLIB [9].

The computational results are summarised in Fig. 1. These figures compare both the wall-clock time required for convergence (compared to the slowest algorithm to achieve convergence) and the quality of the feasible solutions obtained at termination. A more detailed summary of our results, including absolute runtime and solution values, is provided in the supplementary material (Table B1).

When applied to the SSLP instances, all three algorithms typically find the same solution, and it is often optimal. The Dual Step Length variant of FPPH outperforms PH in terms of runtime for all instances except for SSLP-15-45-5, where they require an equal (and small) amount of time. The Penalty Only variant of FPPH often outperforms both the Dual Step Length variant and PH, but fails to find the optimal solution of SSLP-15-45-10 and is a little slower when applied to SSLP-5-25-50 and SSLP-15-45-5. PH fails to converge to a feasible solution within 100 iterations when applied to SSLP-15-45-15.

When applied to the CAP instances, PH fails to converge to a feasible solution within 100 iterations for four of the eight instances and is again consistently outperformed in terms of runtime and matched in solution quality by the Dual Step Length variant of FPPH even when it does converge. There is not a clear favourite between the Penalty Only and Dual Step Length variants when applied to the CAP instances; each variant finds a higher-quality solution than the other variant for at least one instance, and converges faster than the other variant for several instances.

6.4 Computation experiments: Mixed-integer first-stage instances

We also compared the performance of FPPH and our implementation of Progressive Hedging applied to the DCAP instance set [33, 34]. In this case, we compare with the known upper bounds given by SIPLIB [9]. These results are summarised in Fig. 2, with further detail in the supplementary material (Table B2).

For these instances, PH consistently obtains consensus in the integer variables within 100 iterations and generally outperforms the Dual Step Length variant of FPPH in terms of runtime and solution quality. The Penalty Only variant of FPPH obtains better solution quality than PH when applied to DCAP342 (with 200, 300 and 500 scenarios) but finds considerably worse solutions when applied to the other DCAP instances.

7 Conclusions

We have shown that the tools and techniques of variational analysis are well-suited for the analysis of the progressive hedging algorithm as applied to SMIP. Indeed the analysis interfaces well with the “just MIP it” approach to the development heuristics in this field. It allows for a new study of augmented Lagrangians and Gauss–Seidel methods, specifically recognising where the presence of smoothness is essential for the success of algorithmic approaches. The theory is able to shed light on how critical parameters need to be updated to ensure convergence.

Our computational results demonstrate that the FPPH algorithm, which is motivated by the above theory, has the potential to outperform PH in terms of quickly and reliably converging to high-quality feasible solutions for SMIP instances, particularly those with pure integer first-stage variables. By contrast, PH tended to outperform FPPH when applied to the DCAP instances which have mixed-integer first-stage variables, though the variant of FPPH performing no dual variable updates found higher-quality solutions than PH for the DCAP342 subclass. Further testing on a wider variety of instance classes is needed for a deeper understanding of how the structure of SMIP instances influences the relative performance of PH and FPPH, and how to set the penalty update rules of FPPH for the best performance on a given class of instances.

Data availability

The problem instances are based on the data from three distinct sets of problems, namely the capacitated facility location problem (CAP) from [31], the server location under uncertainty problem (SSLP), introduced in [32] and the dynamic capacity allocation problem (DCAP) available in [33]. (SSLP & DCAP) https://www2.isye.gatech.edu/~sahmed/siplib/ (CAP) The instances are based on the data from: http://people.brunel.ac.uk/~mastjjb/jeb/orlib/capinfo.html Data was provided by the authors of [31] on request.

References

Birge, J.R., Louveaux, F.: Introduction to stochastic programming. Springer, Berlin (2011)
Book MATH Google Scholar
Montero, L., Bello, A., Reneses, J.: A review on the unit commitment problem: approaches, techniques, and resolution methods. Energies 15(4), 1296 (2022)
Article Google Scholar
Parvez, I., Shen, J., Cheng, C., Parvez, I., Shen, J., Khan, M., Cheng, C.: Modeling and solution techniques used for hydro generation scheduling. Water 11(7) (2019)
Turan, H., Kahagalage, S., El Sawah, S., Jalalvand, F.: A multi-objective simulation-optimization for a joint problem of strategic facility location, workforce planning, and capacity allocation: a case study in the royal australian navy. Expert Syst. Appl. 186, 115751 (2021)
Article Google Scholar
Fadaki, M., Abareshi, A., Lee, P., Far, S.: Multi-period vaccine allocation model in a pandemic: a case study of covid-19 in Australia. Trans. Res. Part E: Logist. Trans. Rev. 161, 102689 (2022)
Article Google Scholar
Shone, R., Glazebrook, K., Zografos, K.: Applications of stochastic modeling in air traffic management: methods, challenges and opportunities for solving air traffic problems under uncertainty. Eur. J. Oper. Res. 292(1), 1–26 (2021)
Article MathSciNet MATH Google Scholar
Badilla Veliz, F., Watson, J.-P., Weintraub, A., Wets, R.J.-B., Woodruff, D.L.: Stochastic optimization models in forest planning: a progressive hedging solution approach. Ann. Oper. Res. 232, 259–274 (2015)
MathSciNet MATH Google Scholar
Fan, Y., Schwartz, F., Voß, S., Woodruff, D.L.: Stochastic programming for global supply chain planning under uncertainty: an outline. In: Bektaş, T., Coniglio, S., Martinez-Sykora, A., Voß, S. (eds.) Computational Logistics, pp. 437–451. Springer, Cham (2017)
Chapter Google Scholar
Ahmed, S., Garcia, R., Kong, N., Ntaimo, L., Parija, G., Qiu, F., Sen, S.: SIPLIB: A stochastic integer programming test problem library (2015). http://www.isye.gatech.edu/sahmed/siplib
Rockafellar, R.T., Wets, R.J.-B.: Scenarios and policy aggregation in optimization under uncertainty. Math. Oper. Res. 16(1), 119–147 (1991)
Article MathSciNet MATH Google Scholar
Watson, J.-P., Woodruff, D.L.: Progressive hedging innovations for a class of stochastic mixed-integer resource allocation problems. CMS 8(4), 355–370 (2011)
Article MathSciNet MATH Google Scholar
Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 3(1), 1–122 (2011)
Article MATH Google Scholar
Rockafellar, R.T., Wets, R.J.-B.: Variational Analysis. Grundlehren der Mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences], vol. 317, p. 733. Springer, Berlin (1998)
Burachik, R.S., Rubinov, A.: Abstract convexity and augmented Lagrangians. SIAM J. Optim. 18(2), 413–436 (2007)
Article MathSciNet MATH Google Scholar
Boland, N.L., Eberhard, A.C.: On the augmented Lagrangian dual for integer programming. Math. Program. 150(2, Ser. A), 491–509 (2015)
Feizollahi, M.J., Ahmed, S., Sun, A.: Exact augmented Lagrangian duality for mixed integer linear programming. Mathematical Programming, 1–23 (2016)
Oliveira, F., Christiansen, J., Dandurand, B., Eberhard, A.: Combining penalty-based and gauss-seidel methods for solving stochastic mixed-integer problems. Int. Trans. Operat. Res. 27(1), 494–524 (2018)
Article MathSciNet Google Scholar
Frank, M., Wolfe, P.: An algorithm for quadratic programming. Naval Res. Logist. Quart. 3(1–2), 95–110 (1956)
Article MathSciNet Google Scholar
Boland, N., Christiansen, J., Dandurand, B., Eberhard, A., Linderoth, J., Luedtke, J., Oliveira, F.: Combining progressive hedging with a frank-wolfe method to compute Lagrangian dual bounds in stochastic mixed-integer programming. SIAM J. Optim. 28(2), 1312–1336 (2018)
Article MathSciNet MATH Google Scholar
Boland, N.L., Eberhard, A.C., Engineer, F., Tsoukalas, A.: A new approach to the feasibility pump in mixed integer programming. SIAM J. Optim. 22(3), 831–861 (2012)
Article MathSciNet MATH Google Scholar
Geißler, B., Morsi, A., Schewe, L., Schmidt, M.: Penalty alternating direction methods for mixed-integer optimization: a new view on feasibility pumps. SIAM J. Optim. 27(3), 1611–1636 (2017)
Article MathSciNet MATH Google Scholar
Tseng, P.: Convergence of a block coordinate descent method for nondifferentiable minimization. J. Optim. Theory Appl. 109, 475–494 (2001)
Article MathSciNet MATH Google Scholar
Roshchina, V.: Exact calculus of fréchet subdifferentials for hadamard directionally differentiable functions. Nonlinear Anal. 69, 1112–1124 (2008)
Article MathSciNet MATH Google Scholar
Bertsekas, D.P.: Nonlinear Programming. Athena Scientific, Nashua, NH (1999)
MATH Google Scholar
Bonettini, S.: Inexact block coordinate descent methods with application to non-negative matrix factorization. IMA J. Numer. Anal. 31(4), 1431–1452 (2011)
Article MathSciNet MATH Google Scholar
Boland, N., Christiansen, J., Dandurand, B., Eberhard, A., Oliveira, F.: A parallelizable augmented Lagrangian method applied to large-scale non-convex-constrained optimization problems. Math. Progr. 175, 503–536 (2018)
Article MathSciNet MATH Google Scholar
Beer, G. (1993) Topologies on closed and closed convex sets. mathematics and its applications, vol. 268, Kluwer Academic Publishers Group, Dordrecht , p. 340
IBM Corporation: IBM ILOG CPLEX V22.1. IBM Corporation. Last accessed 27 Jun 2022. https://www.ibm.com/docs/en/icos/22.1.0
COmputational INfrastructure for Operations Research. Last accessed 29 March, 2020. http://www.coin-or.org/
National Computing Infrastructure (NCI): NCI Website. National Computing Infrastructure (NCI). Last accessed 19 November 2016. http://www.nci.org.au
Bodur, M., Dash, S., Günlük, O., Luedtke, J.: Strengthened benders cuts for stochastic integer programs with continuous recourse. Technical Report RC25452, IBM Research Report, 2014. Available as Optimization Online 2014-03-4263 (2014)
Ntaimo, L., Sen, S.: The million-variable “march’’ for stochastic combinatorial optimization. J. Global Optim. 32(3), 385–400 (2005)
Article MathSciNet MATH Google Scholar
Ahmed, S., Garcia, R.: Dynamic capacity acquisition and assignment under uncertainty. Ann. Oper. Res. 124, 267–283 (2003)
Article MathSciNet MATH Google Scholar
Ahmed, S., Tawarmalani, M., Sahinidis, N.V.: A finite branch-and-bound algorithm for two-stage stochastic integer programs. Math. Program. 100(2(Ser. A)), 355–377 (2004)
Article MathSciNet MATH Google Scholar

Download references

Funding

Open Access funding enabled and organized by CAUL and its Member Institutions. This work was supported by the Australian Research Council (ARC) through Discovery Program under Grants DP140100985.

Author information

Jeffrey Christiansen, Brian Dandurand, Andrew Eberhard and Fabricio Oliveira have contributed equally to this work.

Authors and Affiliations

Mathematics and Statistics, RMIT University, 124 La Trobe Street, 3000, Melbourne, VIC, Australia
Jeffrey Christiansen, Brian Dandurand & Andrew Eberhard
Mathematics and Systems Analysis, Aalto University, Otakaari 1, 02150, Espoo, Uusimaa, Finland
Fabricio Oliveira

Authors

Jeffrey Christiansen
View author publications
You can also search for this author in PubMed Google Scholar
Brian Dandurand
View author publications
You can also search for this author in PubMed Google Scholar
Andrew Eberhard
View author publications
You can also search for this author in PubMed Google Scholar
Fabricio Oliveira
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Andrew Eberhard.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Ethical approval

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This work was supported by the Australian Research Council (ARC) Grant No. (ARC DP140100985).

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file 1 (pdf 175 KB)

Appendix A: Proofs of selected results

1.1 A.1 Proof of Proposition 4

Proof

The proofs of all four claims are the same for each $s \in S$. The first three claims are obvious from the compactness of the constraint sets $K_s$, the coercivity of $\psi $ and its other properties as implied by Assumptions 2, and the role of $\rho $ in the objective function $(x_s,y_s,w_s,z) \mapsto f_s(x_s,y_s)+\lambda _s^\top x_s + \rho \pi _s \psi (z-x_s,w_s-y_s)$. To show claim (4), let $(z^1,w_s^1),(z,w_s) \in {\mathbb {X}}\times {\mathbb {Y}}_s$, $\Vert (z^1-z, w_s^1-w_s)\Vert < \delta $, and let $(x_s^\rho ,y_s^\rho ) \in \Phi _s^{\lambda ,\rho ,\pi }(z,w_s)$. Via convexity [13, Example 9.14], we may choose, $L_s^\Delta \ge 1$ to be the local Lipschitz continuity modulus associated with the finite valued function $\psi $ on $\overline{B}_{\Delta }(0,0):=\{(u,v)\;: \; \Vert (u,v)\Vert _2 \le \Delta \}$, with $\Delta =\Vert (z- x_s^\rho , w_s - y_s^\rho ) \Vert + \delta $. Then using dual feasibility ($\sum _{s \in S} \lambda _s =0$) we have

$$\begin{aligned} \varphi _s^{\lambda ,\rho ,\pi }(z^1,w_s^1) - \varphi _s^{\lambda ,\rho ,\pi }(z,w_s)&\le \sum _{s \in S} \lambda _s (z^1-x_s^{\rho } - (z-x_s^{\rho }) )\\&\quad + \pi _s \rho [\psi (z^1 - x_s^\rho , w_s^1 - y_s^\rho ) - \psi (z- x_s^\rho , w_s - y_s^\rho )] \\&\le \pi _s \rho L^\Delta \Vert (z^1 - z, w_s^1-w_s) \Vert \end{aligned}$$

for $(z,w_s),(z^1,w_s^1) \in K_s^\Delta $, thus establishing the Lipschitz modulus $\pi _s \rho L^\Delta $. It readily follows by summation that $\varphi ^{\lambda ,\rho ,\pi }$ is Lipschitz continuous with modulus $\rho L^\Delta $ for $(z,w_s),(z^1,w_s^1) \in K_s^\Delta $, $s \in S$. $\square $

1.2 A.2 Proof of Lemma 6

Proof

By the continuity of $\nabla \psi $, we have for each fixed $D > 0$ that there exists $\tilde{\delta } > 0$ for which $\Vert \nabla \psi (u^0,v^0) \Vert < \frac{m}{2} D$ whenever $\Vert (u^0,v^0)\Vert < \tilde{\delta }$. For any discrepancy $(u,v)$ for which $\Vert (u-u^0,v-v^0)\Vert > D$, we have $\Vert \nabla \psi (u^0,v^0) \Vert < \frac{m}{2} \Vert (u-u^0,v-v^0)\Vert $, which then implies by the Cauchy-Schwartz inequality

$$\begin{aligned}&\langle \nabla \psi (u^0,v^0) +\frac{m}{2} (u-u^0,v-v^0), (u-u^0,v-v^0)\rangle > 0 \end{aligned}$$

which implies

$$\begin{aligned}&\langle \nabla \psi (u^0,v^0), (u-u^0,v-v^0) \rangle + \frac{m}{2}\Vert (u-u^0,v-v^0)\Vert ^2 > 0. \end{aligned}$$

Adding the last inequality to the inequality

$$\begin{aligned} \psi (u,v) \ge \psi (u^0,v^0) + \nabla \psi (u^0,v^0), (u-u^0,v-v^0) \rangle + \frac{m}{2}\Vert (u-u^0,v-v^0)\Vert ^2 \end{aligned}$$

obtained due to Assumption 2(3), we have the intended inequality $\psi (u^0,v^0) < \psi (u,v)$. $\square $

1.3 A.3 Proof of proposition 17

Proof

Proof of 1: Assume for sake of contradiction that $\limsup _{n\rightarrow \infty } \Vert \widetilde{z}^n-\widetilde{x}_s^n\Vert > 0$ for at least one $s \in S$. As $\widetilde{z}^n \rightarrow \overline{z}$, $\bar{\delta }>0$, and the boundedness of X, so there exists a fixed $0< \eta < 1$ such that for all sufficiently large $n \ge 0$ and any $\overline{x}^n \in X$ we have $z^{\eta ,n}:= \eta \widetilde{z}^n + (1-\eta )\overline{x}^n \in B_{\bar{\delta }} (\widetilde{z}^n)$ (indeed there exists a fixed $\kappa >0$ such that $\kappa B_{\bar{\delta }} (\widetilde{z}^n) \supseteq X$ for n large).

For any given n, there exists $ s^{\prime } \in \arg \min _{s \in S} \psi (\eta (\widetilde{z}^n - \widetilde{x}_s^n),0). $ Set $\overline{x}^n:= \widetilde{x}_{s^{\prime }}^n \in X$. Then

$$\begin{aligned} \sum _{s \in S} \pi ^{n}_s\psi (\eta (\widetilde{z}^n - \overline{x}^n), 0) = \psi (\eta (\widetilde{z}^n - \overline{x}^n), 0) \le \sum _{s \in S} \pi ^{n}_s \psi (\eta (\widetilde{z}^n - \widetilde{x}_s^n), 0) \end{aligned}$$

(A1)

By the relative recourse assumption, there exists $ \overline{y}^n \in Y_s (\overline{x}^n)$ for all $s \in S$.

Let $z^{\eta ,n} = \eta \widetilde{z}^n + (1-\eta )\overline{x}^n$ and note that for any $0<\eta <1$ we have by (A1)

$$\begin{aligned} \sum _{s \in S} \pi _s^n \psi (z^{\eta ,n} - \overline{x}^n,0)&= \sum _{s \in S} \pi _s^n \psi (\eta (\widetilde{z}^n - \overline{x}^n) ,0) \le \sum _{s \in S} \pi ^{n}_s \psi (\eta (\widetilde{z}^n - \widetilde{x}_s^n), 0) \nonumber \\&\le \sum _{s \in S} \pi _s^n \psi ( \widetilde{z}^n - \widetilde{x}_s^n ,0) -\frac{m(1-\eta )^2}{2} \sum _{s \in S} \pi _s^n\Vert \widetilde{z}^n - \widetilde{x}_s^n\Vert ^2. \end{aligned}$$

We used the fact that the gradient descent term $\langle \nabla \psi _z ( \eta ( \widetilde{z}^n - \widetilde{x}_s^n ),0), (1-\eta ) (\widetilde{z}^n - \widetilde{\varvec{x}}_s^n) \rangle $ of the ICRF strong convexity assumption 2(3) is guaranteed to be positive due to the convexity of $\psi $ and the strict increasing property in ICRF assumption 2(2). Define

$$\begin{aligned} M:= \limsup _{n\rightarrow \infty } \max _{x,x',y,y'} \{ \sum _{s \in S}&p_s[ (c+\frac{\lambda _s^n}{p_s})^\top (x_s-x_s') + d_s^\top (y-y')] \\&\mid (x,y_s ),(x',y_s') \in K_s, \, s \in S \}, \end{aligned}$$

which is guaranteed to be finite due to the boundedness of $K_s$, $s \in S$ and the PWA assumption 10 boundedness of $\{\lambda ^n\}_{n=0}^\infty $. We have for each $n \ge 0$ and $\rho ^n$ sufficiently large that

$$\begin{aligned}&\sum _{s \in S} \inf _{w} \varphi _s^n\left( z^{\eta ,n}, w_s\right) \le \sum _{s \in S} p_s \left[ \left( c+\frac{\lambda _s^n}{p_s}\right) ^\top \overline{x}^n + \inf _{y\in Y_s(\overline{x}^n)} \{ d^{\top } y_s \}\right] +\rho ^n \pi _s^n \psi (z^{\eta ,n} - \overline{x}^n,0) \\&\quad \le \sum _{s \in S} p_s [(c+\frac{\lambda _s^n}{p_s})^\top \overline{x}^n + \inf _{y\in Y_s(\overline{x}^n)} \{ d^{\top } y_s \}] +\rho ^n \pi _s^n \psi (\widetilde{z}^n - \widetilde{x}_s^n,0) \\&\qquad - \frac{m(1-\eta )^2 \rho ^n}{2} \sum _{s \in S} \pi _s^n\Vert \widetilde{z}^n - \widetilde{x}_s^n\Vert ^2 \\&\quad \le \sum _{s \in S} \inf _{w} \{\varphi _s^n(\widetilde{z}^n,w_s)\} + M - \frac{m(1-\eta )^2 \rho ^n}{2} \sum _{s \in S} \pi _s^n\Vert \widetilde{z}^n - \widetilde{x}_s^n\Vert ^2 < \sum _{s \in S} \inf _{w} \varphi _s^n(\widetilde{z}^n, w_s). \end{aligned}$$

The last inequality follows from the assumptions that $\limsup _{n\rightarrow \infty } \Vert \widetilde{z}^n-\widetilde{x}_s^n\Vert > 0$ for at least one $s \in S$ and the assumption that the corresponding $\pi _s^n$ values are bounded away from zero as due to the PWA assumption 10(2). But since $z^{\eta ,n} \in B_{\bar{\delta }}(\overline{z})$, for $n$ large enough, we have a contradiction of the local optimality of $\widetilde{z}^n$ for $z\mapsto \inf _{w} \varphi ^n(z,w)$ for some $n$ arbitrarily large. Therefore, $\lim _{n\rightarrow \infty } \Vert \widetilde{z}-\widetilde{x}_s\Vert = 0$ for all $s \in S$, as long as $\pi _s^n \not \rightarrow 0$ as in the PWA assumptions 10(2), and hence $\widetilde{x}^n_{s} \rightarrow \overline{z}$. In particular $\sum _{s \in S} \pi _s^n \widetilde{x}_s^n \rightarrow \overline{z}$. Taking into account the definition of $S_n$ we have $\widetilde{x}_{s,{\mathcal {I}}}^n = \widetilde{z}_{{\mathcal {I}}}^n$ for all $s \in S$ and n sufficiently large. (When $\psi = \frac{1}{2} \Vert \cdot \Vert ^2$ we have from Lemma 13 that $\widetilde{z}^n \in {{\,\textrm{argmin}\,}}_{z} \frac{1}{2} \sum _{s \in S} \Vert z- \widetilde{x}_s^n \Vert ^2$ so $\widetilde{z}^n = \sum _{s \in S} \pi _s \widetilde{x}_s^n$. Moreover, in general $\sum _{s \in S} \pi _s \widetilde{x}_s^n \in X$ for $n$ large enough due to the fact that all $\widetilde{\varvec{x}}_s^n \in X$, $\widetilde{z}_{\mathcal {I}}^n = \widetilde{x}_{s,{\mathcal {I}}}^n$ for all $s \in S$ and $\sum _{s \in S} \pi _s =1$.)

Proof of 2: By Proposition 3 have the first assertion of 4. holding. The second follows from the closed graph of $x_s \mapsto Y_s (x_s)$. $\square $

1.4 A.4 Proof of lemma 31

Proof

The proof is by induction, with the key observations that, for $j=1$

$$\begin{aligned}&g^{n}(x^{n+1}, y^{n+1}, w^{n+1}, z^{n}+\alpha ^nd^n) - g^{n}(x^{n+1}, y^{n+1}, w^{n+1}, z^{n}) \\&\quad \ge g^{n}(x^{n+1}, y^{n+1}, w^{n+1}, z^{n+1}) - g^{n}(x^{n+1}, y^{n+1}, w^{n+1}, z^{n}) \\&\quad = g^{n}(x^{n+1}, y^{n+1}, w^{n+1},z^{n+1}) -g^{n+1}(x^{n+1}, y^{n+1}, w^{n+1},z^{n+1})\\&\qquad +g^{n+1}(x^{n+1}, y^{n+1}, w^{n+1},z^{n+1}) - g^{n}(x^{n+1}, y^{n+1}, w^{n+1},z^{n}) \\&\quad \ge g^{n}(x^{n+1}, y^{n+1}, w^{n+1},z^{n+1}) -g^{n+1}(x^{n+1}, y^{n+1}, w^{n+1},z^{n+1})\\&\qquad +g^{n+1}(x^{n+2}, y^{n+2}, w^{n+2},z^{n+1}) - g^{n}(x^{n+1}, y^{n+1}, w^{n+1},z^{n}) \\ \end{aligned}$$

and, more generally for $j > 1$, the satisfaction of the inequality

$$\begin{aligned}&g^{n}(x^{n+1}, y^{n+1}, w^{n+1},z^{n}+\alpha ^nd^n) - g^{n}(x^{n+1}, y^{n+1}, w^{n+1},z^{n}) \\&\quad \ge \sum _{i=0}^{j-2} \left[ g^{n+i}(x^{n+i+1}, y^{n+i+1}, w^{n+i+1},z^{n+i+1})\right. \\&\qquad \left. -g^{n+i+1}(x^{n+i+1}, y^{n+i+1}, w^{n+i+1},z^{n+i+1}) \right] \\&\qquad +g^{n+j-1}(x^{n+j}, y^{n+j}, w^{n+j},z^{n+j-1}) -g^{n}(x^{n+1}, y^{n+1}, w^{n+1},z^{n})\\ \end{aligned}$$

implies

$$\begin{aligned}&g^{n}(x^{n+1}, y^{n+1}, w^{n+1},z^{n}+\alpha ^nd^n) - g^{n}(x^{n+1}, y^{n+1}, w^{n+1},z^{n}) \\&\quad \ge \sum _{i=0}^{j-2} \left[ g^{n+i}(x^{n+i+1}, y^{n+i+1}, w^{n+i+1},z^{n+i+1})\right. \\&\qquad \left. -g^{n+i+1}(x^{n+i+1}, y^{n+i+1}, w^{n+i+1},z^{n+i+1}) \right] \\&\qquad +g^{n+j-1}(x^{n+j}, y^{n+j}, w^{n+j},z^{n+j}) -g^{n+j}(x^{n+j}, y^{n+j}, w^{n+j},z^{n+j})\\&\qquad +g^{n+j}(x^{n+j}, y^{n+j}, w^{n+j},z^{n+j}) -g^{n}(x^{n+1}, y^{n+1}, w^{n+1},z^{n})\\&\quad \ge \sum _{i=0}^{j-2} \left[ g^{n+i}(x^{n+i+1}, y^{n+i+1}, w^{n+i+1},z^{n+i+1})\right. \\&\qquad \left. -g^{n+i+1}(x^{n+i+1}, y^{n+i+1}, w^{n+i+1},z^{n+i+1}) \right] \\&\qquad +g^{n+j-1}(x^{n+j}, y^{n+j}, w^{n+j},z^{n+j}) -g^{n+j}(x^{n+j}, y^{n+j}, w^{n+j},z^{n+j})\\&\qquad +g^{n+j}(x^{n+j+1}, y^{n+j+1}, w^{n+j+1},z^{n+j}) -g^{n}(x^{n+1}, y^{n+1}, w^{n+1},z^{n}) \\&\quad =\sum _{i=0}^{j-1} \left[ g^{n+i}(x^{n+i+1}, y^{n+i+1}, w^{n+i+1},z^{n+i+1})\right. \\&\qquad \left. -g^{n+i+1}(x^{n+i+1}, y^{n+i+1}, w^{n+i+1},z^{n+i+1}) \right] \\&\qquad +g^{n+j}(x^{n+j+1}, y^{n+j+1}, w^{n+j+1},z^{n+j}) -g^{n}(x^{n+1}, y^{n+1}, w^{n+1},z^{n}) \\ \end{aligned}$$

$\square $

1.5 A.5 Proof of lemma 33

Proof

As aid, denote $\rho ^{n_k+i} = \rho ^{n_k} \eta _i^k$ for $i=0,1,\dots . j_k-1$, where it is evident for all $k\ge 0$ that $\eta _0^k=1$ and $\eta _i^k\ge 1$ for $i > 1$. Thus, given fixed $k$ with $n_{k+1}=n_{k}+j_k$, the sum of differences becomes (setting $j = j_k$)

$$\begin{aligned}&\sum _{s \in S} \sum _{i=0}^{j-1} \frac{ \left( \frac{1}{\eta ^k_i} - \frac{1}{\eta ^k_{i+1}} \right) f_s(x_s^{n_k+i+1}, y_s^{n_k+i+1} ) }{\rho ^{n_k}} \end{aligned}$$

(A2a)

$$\begin{aligned}&-\left( \frac{\lambda _s^{n_k+i}}{\rho ^{n_k+i}} -\frac{\lambda _s^{n_k+i+1}}{\rho ^{n_k+i+1}}\right) ^\top x_s^{n_k+i+1} \end{aligned}$$

(A2b)

$$\begin{aligned}&+ \left( \pi _s^{n_k+i} - \pi _s^{n_k+i+1}\right) \psi (z^{n_k+i+1}-x_s^{n_k+i+1}, w_s^{n_k+i+1}-y_s^{n_k+i+1}). \end{aligned}$$

(A2c)

We address the vanishing limits of the three summation term groupings (A2a)– (A2c) separately. Because $\frac{1}{\eta ^k_i} - \frac{1}{\eta ^k_{i+1}} \ge 0$ and due to the boundedness

$$\begin{aligned} -M \le \sum _{s \in S} \left[ f_s(x_s^{n_k+i+1}, y_s^{n_k+i+1} ) \right] \le M \end{aligned}$$

of the inner summation over $s \in S$ in (A2a) implied by the SMIP assumption 1 and assumption 29, we have

$$\begin{aligned} -\frac{M}{\rho _{n}} \sum _{i=0}^{j-1} \left[ \frac{1}{\eta ^k_i} - \frac{1}{\eta ^k_{i+1}} \right]&\le \frac{1}{\rho _{n}}\sum _{i=0}^{j-1} \left[ \frac{1}{\eta ^k_i} - \frac{1}{\eta ^k_{i+1}} \right] \sum _{s \in S} \left[ f_s(x_s^{n_k+i+1}, y_s^{n_k+i+1} ) \right] \\&\le \frac{M}{\rho _{n}}\sum _{i=0}^{j-1} \left[ \frac{1}{\eta ^k_i} - \frac{1}{\eta ^k_{i+1}} \right] \end{aligned}$$

which implies (via telescoping)

$$\begin{aligned}&-\frac{M}{\rho _{n}} \left[ \frac{1}{\eta ^k_0} - \frac{1}{\eta ^k_{j_{k}}} \right] \le \frac{1}{\rho _{n}}\sum _{i=0}^{j-1} \left[ \frac{1}{\eta ^k_i} - \frac{1}{\eta ^k_{i+1}} \right] \sum _{s \in S} \left[ f_s(x_s^{n_k+i+1}, y_s^{n_k+i+1} ) \right] \\&\quad \le \frac{M}{\rho _{n}} \left[ \frac{1}{\eta ^k_0} - \frac{1}{\eta ^k_{j_{k}}} \right] \end{aligned}$$

If $\lim _{n \rightarrow \infty } \rho _{n} = \infty $, then $\lim _{n \rightarrow \infty } \frac{M}{\rho _n} = 0$ while $ \left[ \frac{1}{\eta ^k_0} - \frac{1}{\eta ^k_{j}} \right] \le 1$ for all $k\ge 0$, and so the summation (A2a) vanishes. Otherwise, if $\lim _{n \rightarrow \infty } \rho _{n} = \bar{\rho } < \infty $, then $\lim _{n \rightarrow \infty } \frac{M}{\rho _n} = \frac{M}{\bar{\rho }}$ and $\lim _{k\rightarrow \infty } \left[ \frac{1}{\eta ^k_0} - \frac{1}{\eta ^k_{j}} \right] = 0$ will vanish instead (as $\eta ^k_{j_k} =1 $ eventually). Therefore, the summation (A2a) vanishes in either case as $k\rightarrow \infty $.

To demonstrate the vanishing of the second summation (A2b) we note that when $\rho ^{n} \rightarrow \infty $ then this term vanishes since $x_s \in X$ being compact implies $\Vert x_s \Vert \le M<+\infty $. Otherwise if $\rho ^{n} = \overline{\rho }$ eventually we have the inner sum of the term eventually bounded by $ \frac{M}{\overline{\rho }} (\sum _{i=0}^{j_n} \Vert \lambda _s^{n_k} - \lambda _s^{n_{k+i}} \Vert )$. Now $\lim _{k \rightarrow \infty } (\sum _{i=0}^{j_n} \Vert \lambda _s^{n_k} - \lambda _s^{n_{k+i}} \Vert ) =0$ as the tail of a convergent series converges to 0.

To demonstrate the vanishing of the third summation (A2c) as $k\rightarrow \infty $, the satisfaction of assumption 29(3) implies that it only needs to be shown that the sequence

$$\begin{aligned} \left\{ \psi _s(z^{n_k+i}-x_s^{n_k+i+1}, w_s^{n_k+i+1}-y_s^{n_k+i+1}) \right\} _{k=0}^\infty \end{aligned}$$

(A3)

is bounded in magnitude by M for all $n$. Indeed if this is the case then the inner summation of the third term may be bounded in both cases: we have

$$\begin{aligned} \sum _{i=0}^{j-1}&\left( \pi _s^{n_k+i} - \pi _s^{n_k+i+1}\right) \psi \left( z^{n_k+i+1}-x_s^{n_k+i+1}, w_s^{n_k+i+1}-y_s^{n_k+i+1}\right) \\&\le M \sum _{i=0}^{j-1} \left\| \pi _s^{n_k+i } - \pi _s^{n_{k+i+1} }\right\| . \end{aligned}$$

Now $\lim _{k \rightarrow \infty } \sum _{i=0}^{j-1} \left\| \pi _s^{n_k+i } - \pi _s^{n_{k+i+1} }\right\| =0$ as the tail of a convergent series converges to 0. The boundedness of (A3) follows from the strong convexity assumption on $\psi $ and the compactness of K. Indeed it is enough to observe that under these assumptions we have $\liminf _{\Vert z\Vert \rightarrow \infty } \inf _{x \in X } \{ \psi (z-x,0 ) \} = + \infty $. But the convergence $\lim _{k\rightarrow \infty } z^{n_k} = \overline{z}$ while (x, y) are bounded due to the compactness of K, and so the boundedness of (A3) must hold. $\square $

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Christiansen, J., Dandurand, B., Eberhard, A. et al. A study of progressive hedging for stochastic integer programming. Comput Optim Appl 86, 989–1034 (2023). https://doi.org/10.1007/s10589-023-00532-w

Download citation

Received: 08 July 2022
Accepted: 19 September 2023
Published: 11 October 2023
Issue Date: December 2023
DOI: https://doi.org/10.1007/s10589-023-00532-w

Keywords

Mathematics Subject Classification

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Solution	Value	Locally Optimal Over \(B_\delta (\overline{z},\overline{\varvec{w}})\)
\((\widetilde{\varvec{x}}^n,\widetilde{\varvec{y}}^n,\widetilde{z}^n,\widetilde{\varvec{w}}^n)\)	\(\varphi ^n(\widetilde{z}^n,\widetilde{\varvec{w}}^n)\)	\((\overline{z},\overline{\varvec{w}})\)	\(\delta \)	Persistent?
\(\left( \left[ \begin{array}{c} 0 \\ 0 \end{array}\right] ,\left[ \begin{array}{c} 0 \\ 1 \end{array}\right] ,0,\left[ \begin{array}{c} 0 \\ 1 \end{array}\right] \right) \)	1	\(\left( 0,\left[ \begin{array}{c} 0 \\ 1 \end{array}\right] \right) \)	\(\frac{1}{2}-\frac{1}{\rho ^n}\)	Yes
\(\left( \left[ \begin{array}{c} 0 \\ 1 \end{array}\right] ,\left[ \begin{array}{c} 0 \\ 0 \end{array}\right] ,\frac{1}{2},\left[ \begin{array}{c} 0 \\ 0 \end{array}\right] \right) \)	\(\frac{\rho ^n}{4}\)	\(\left( \frac{1}{2},\left[ \begin{array}{c} 0 \\ 0 \end{array}\right] \right) \)	\(\frac{1}{\rho ^n}\)	No
\(\left( \left[ \begin{array}{c} 1 \\ 1 \end{array}\right] ,\left[ \begin{array}{c} 1 \\ 0 \end{array}\right] ,1,\left[ \begin{array}{c} 1 \\ 0 \end{array}\right] \right) \)	1	\(\left( 1,\left[ \begin{array}{c} 1 \\ 0 \end{array}\right] \right) \)	\(\frac{1}{2}-\frac{1}{\rho ^n}\)	Yes

	Cross-Section	Gradient w.r.t \( z\)	Value	Loc. opt. over \( B_\delta (\overline{z},\overline{\varvec{w}})\)		\( \Phi ^{*}\)
\( \varvec{x}_{\mathcal {I}}\)	\( \frac{1}{\rho ^n}\inf _w\varphi ^{n}\left( z,\varvec{w} \mid \varvec{x}_{\mathcal {I}}\right) \)	\( \widehat{\partial }_z\varphi ^n(z,\varvec{w})\)	\( \frac{1}{\rho }\varphi ^n(\widetilde{z}^n,\widetilde{\varvec{w}}^n)\)	\( (\widetilde{z}^n,\widetilde{\varvec{w}}^n)\)	\( \delta \)	\( (z^,\varvec{w}^)\)
\( \left[ \begin{array}{c} 0 \\ 0 \end{array}\right] \)	\(\frac{1}{\rho ^n}+\Vert z\Vert ^2\)	\( 2z\)	\(\frac{1}{\rho ^n}\)	\( \left( 0,\left[ \begin{array}{c} 0 \\ 1 \end{array}\right] \right) \)	\(\frac{1}{2}-\frac{1}{\rho ^n}\)	\( \{0\} \times \mathbb {R}^2\)
\( \left[ \begin{array}{c} 1 \\ 0 \end{array}\right] \)	\( \frac{2}{\rho ^n}+\frac{1}{2}\left( \Vert z\Vert ^2 + \Vert z-1\Vert ^2\right) \)	\( 2z-1\)	\(\frac{2}{\rho ^n}+\frac{1}{4}\)	\( \left( \frac{1}{2},\left[ \begin{array}{c} 1 \\ 1 \end{array}\right] \right) \)	0	\( \left\{ \frac{1}{2}\right\} \times \mathbb {R}^2\)
\( \left[ \begin{array}{c} 0 \\ 1 \end{array}\right] \)	\( \frac{1}{2}\left( \Vert z\Vert ^2 + \Vert z-1\Vert ^2\right) \)	\( 2z-1\)	\(\frac{1}{4}\)	\( \left( \frac{1}{2},\left[ \begin{array}{c} 0 \\ 0 \end{array}\right] \right) \)	\( \frac{1}{\rho ^n}\)	\({\left\{ \frac{1}{2}\right\} \times \mathbb {R}^2}\)
\( \left[ \begin{array}{c} 1 \\ 1 \end{array}\right] \)	\( \frac{1}{\rho ^n}+\Vert z-1\Vert ^2\)	\( 2z-2\)	\( \frac{1}{\rho ^n}\)	\( \left( 1,\left[ \begin{array}{c} 1 \\ 0 \end{array}\right] \right) \)	\( \frac{1}{2}-\frac{1}{\rho ^n}\)	\( {\left\{ 1\right\} \times \mathbb {R}^2}\)

A study of progressive hedging for stochastic integer programming

Abstract

Similar content being viewed by others

Tseng’s extragradient method with double projection for solving pseudomonotone variational inequality problems in Hilbert spaces

The Frank-Wolfe Algorithm: A Short Introduction

Relaxed Inertial Method for Solving Split Monotone Variational Inclusion Problem with Multiple Output Sets Without Co-coerciveness and Lipschitz Continuity

1 Introduction

2 Fundamental concepts and conceptual algorithmic framework

Assumption 1

Assumption 2

Remark 1

Remark 2

2.1 Preliminary application of Gauss–Seidel iterations

Proposition 3

Proof

Remark 3

3 Properties of the SMIP regularisation \(\varphi ^{\lambda ,\rho ,\pi }\)

Proposition 4

Proof

Definition 1

Definition 2

Lemma 5

Proof

Lemma 6

Proof

Proposition 7

Proof

Corollary 8

4 The theory of persistent local minima

Assumption 9

Assumption 10

Definition 3

Remark 4

Definition 4

Lemma 11

Proof

Proposition 12

Proof

Lemma 13

Proof

Proposition 14

Proof

Theorem 15

Proof

Assumption 16

Proposition 17

Proof

Corollary 18

Example 1

5 Analysis of the block Gauss–Seidel sequence

Definition 5

Assumption 19

Lemma 20

Lemma 21

Proof

5.1 On the stationarity of Gauss–Seidel limit points

Assumption 22

Lemma 23

Proof

Assumption 24

Lemma 25

Proof

Assumption 26

Lemma 27

Proof

5.2 Interleaving analysis and algorithm

Assumption 28

Assumption 29

Lemma 30

Proof

Definition 6

Lemma 31

Proof

Corollary 32

Proof

Lemma 33

Proof

Corollary 34

Proof

Definition 7