1 Introduction

Stochastic mixed integer programming (SMIP) models are, in essence, large-scale mixed-integer programming (MIP) models in which the uncertain nature of the input parameters is modelled by means of a finite set of discrete scenarios [1]. This general framework allows one to model a broad class of decision problems, as can be attested from the wealth of publications from diverse areas of science and engineering. Important applications employing SMIP models include unit commitment [2], hydro-thermal generation scheduling [3], military operations [4], vaccination planning [5], air traffic flow management [6], forestry management and forest fire response [7], supply chain and logistics planning [8], and other applications referred to on the SIPLIB website [9]. The practical and theoretical development of stochastic programs (SP) (without integer variables) preceded SMIP and has influenced its development. The Progressive Hedging (PH) algorithm [10] for solving SP problems is well studied and theoretically supported for convex problems with no integer-constrained variables. Even without this theoretical support in the setting with integer-constrained variables, PH as a heuristic often demonstrates effectiveness for providing both upper and lower bounds [11] and often feasible solutions. Motivated by the limited theoretical support of PH for its application to SMIP and the observed success of PH heuristics for SMIP, our objective is to develop a theoretical framework and demonstrate convergence in numerical experiments.

The large scale of the deterministic equivalent of SMIP models proves to be challenging for off-the-shelf solvers that do not utilise the decomposable structure inherent in the extensive deterministic forms of SMIP models. By contrast, more promising solution methods utilise the SMIP’s decomposable structure. The PH algorithm [10] addresses the decomposable structure as a variant of the alternating direction method of multipliers (ADMM) [12] where the non-anticipativity constraint is relaxed into an augmented Lagrangian (AL) reformulation. One of the earliest detailed treatments of its convergence was based on variational analysis techniques [10], where nonsmooth analysis also provided important tools for the study of convergence with respect to the satisfaction of optimality conditions. Augmented Lagrangian duality, which plays a fundamental role in this work, also appeared in subsequent works to provide duality theorems for very general nonconvex problems, including cases encompassing integer constraints [13, Chapter 11, section K], [14]. These publications have attracted the attention of the integer programming community and resulted in a body of literature focussing specifically on the application of augmented Lagrangian duality to mixed-integer programming (MIP) [15, 16]. This in turn motivated researchers to state, analyse, and test a version of the PH method containing nonsmooth augmentations [17]. Concurrently, researchers have also explored the combination of PH with the Frank-Wolfe algorithm [18] to obtain provably convergent dual bounding methods for SMIP based on the Lagrangian relaxation of the non-anticipativity constraint [19]. Other researchers produced primal (heuristic) methods where the quadratic sub-problems of PH were replaced by mixed integer quadratic programs (MIQP) [11]. These approaches were shown to produce excellent solutions as long as the penalty parameter was chosen judiciously, and so have remained an enigma, lacking any theoretical convergence result. In this paper, we show that variational analysis techniques will further draw back the curtain on this enigma and so explain what actually underpins the success of PH with a MIQP when applied to SMIP.

Under reasonable assumptions, we analyse the convergence of PH applied to SMIPs, where we allow the penalty parameters to vary in a less restricted fashion than is typically required for PH and related approaches, while also applying Lagrangian multiplier updates requiring a special rule due to the required satisfaction of convergence/boundedness criteria that may not be automatically satisfied when PH is applied to SMIPs. Furthermore, our approach allows for more generality in the type of augmented Lagrangian terms. In our analysis, we view the PH method as an application of Gauss–Seidel iterations with penalty and Lagrange terms allowed to vary between iterations. In this setting, we contribute insight into when PH generates a sequence of solutions that converges to a feasible point of the SMIP. Our approach may also be viewed as interfacing some seemingly distinct solution methodologies found in proximal point methods such as PH, Gauss–Seidel (GS) methods, and (mixed-integer) augmented Lagrangian duality [15,16,17, 20]. Furthermore, connection with feasibility pump (FP) primal heuristics is evident in the same spirit as contributed in [21].

Some of the conditions assumed for the penalty and/or augmented Lagrangian term that are required to achieve an exact penalty effect in [15, 16] (e.g., [16, Theorem 5]) require the penalty functions to be non-differentiable, which can impede the analysis of Gauss–Seidel methods [22]. Thus, in this paper, we set out to develop this theory from another direction that allows for a differentiable penalty term, in line with that typically used for analysing progressive hedging-like methods. To compensate for the loss of the exact penalty effect shown in [15, 16], we provide an analysis describing the effect of (potentially) letting the penalty coefficient go to infinity in order to achieve feasibility. In particular, we analyse a SMIP solution method inspired by the FP, PH, and Gauss–Seidel convergence analysis, that for short, will be denoted FPPH, which in practice is similar to the use of PH as an heuristic [11], except we allow for greater generality in the updating of Lagrange multipliers, the changing of penalty coefficients, and in the allowable forms of the augmented Lagrangian penalty function itself. Successful convergence of the method allows for (but is not predicated on) the unbounded increase of penalty parameters. To be clear, our analysis does not promise both primal and dual optimal convergence as is provided for PH in the convex, continuous setting. Rather, we address convergence goals similar to those of feasibility pump methods, where high-quality feasible solutions are sought, and the main challenge is avoiding either non-feasible convergence or cycling.

Our experimental results will demonstrate the effectiveness of FPPH. As with all FP approaches, one needs to develop heuristics for updating the penalty parameters to encourage the methods to locate the best possible feasible solution and hence the strongest primal bound. As a general conclusion, the FPPH presents promising performance relative to Progressive Hedging in terms of quickly obtaining good feasible solutions for SMIPs with pure integer first-stage variables.

This paper is structured as follows. In Sect. 2, we set up the assumptions on the regularisation and the conceptual framework on which the analysis rests. In Sect. 3, further results are developed on how we may decompose the regularisation into its “cross-sections" where integer variables are fixed, which provides a foundation for insight into the local minima of the (whole) regularisation. Section 4 introduces the concept of persistent local minima and their relationship to feasibility for SMIP (1). The convergence analysis of the associated Gauss–Seidel algorithm is carried out in Sect. 5. In Sect. 6, we present computational results illustrating the employment of variants of FPPH to find high-quality feasible solutions to SMIP instances. In Sect. 7, we provide concluding remarks and directions for future developments.

2 Fundamental concepts and conceptual algorithmic framework

Denote \(\varvec{x}= (x_s)_{s \in S}\) where \(x_s \in \mathbb {X}_d:= \mathbb {R}^{n-q} \times \mathbb {Z}^q \subseteq \mathbb {X}:= \mathbb {R}^n\). Similarly \(\varvec{y}= (y_s)_{s \in S}\) where \(y_s \in \mathbb {Y}_d:= \mathbb {R}^{m-r} \times \mathbb {Z}^r \subseteq \mathbb {Y}:= \mathbb {R}^m\). We state the SMIP in the following split-variable deterministic formulation (see, e.g., [1])

$$\begin{aligned} \zeta ^{SMIP} =&\min _{\varvec{x}\in \mathbb {X}^{\small {\vert S\vert } }_d ,\varvec{y}\in \mathbb {Y}^{\small {\vert S \vert }}_d,z\in \mathbb {X},\varvec{w}\in \mathbb {Y}^{\small {\vert S \vert }}} \sum _{s\in S} f_s(x_s,y_s) \end{aligned}$$
(1a)
$$\begin{aligned}&\text { s.t. } \quad (z-x_s, w_s-y_s) = (0,0), \qquad (x_s , y_s) \in K_s,\; s \in S, \end{aligned}$$
(1b)

where

$$\begin{aligned} f_s\left( x_s,y_s\right)&:= p_s \left( c^{\top }x_{s}+d_{s}^{\top }y_{s}\right) ,\qquad s \in S \end{aligned}$$
(1c)
$$\begin{aligned} K_s&:= \left\{ (x,y) \in \mathbb {X}_d \times \mathbb {Y}_d \mid x\in X,\; y\in Y_s(x) \right\} ,\qquad s \in S \end{aligned}$$
(1d)
$$\begin{aligned} X&:= \left\{ x\in \mathbb {X}_d \mid Ax\le b \right\} \end{aligned}$$
(1e)
$$\begin{aligned} Y_s(x)&:= \left\{ y\in \mathbb {Y}_d \mid T_s x+ W_s y\le h_s \right\} ,\qquad s \in S. \end{aligned}$$
(1f)

Note that the constraints \(x_s \in X\) that hold only for the first-stage decision variables \(x_s\) are identical for all \(s \in S\).

We denote the extended real values by \(\mathbb {R}_{+\infty }:= \mathbb {R}\cup \{+\infty \}\). For each scenario \(s \in S\) copy of first-stage variables \(x_s\) and separately for each scenario \(s \in S\) second-stage variables \(y_s\) we assume that the integer variable component indices (\({\mathcal {I}}\)) always follow the real variable component indices (\({\mathcal {R}}\)). That is, \(x_s:=(x_{s,{\mathcal {R}}},x_{s,{\mathcal {I}}})\) and \(y_s:=(y_{s,{\mathcal {R}}},y_{s,{\mathcal {I}}})\) for each \(s \in S\). Define the projection \({{\,\textrm{proj}\,}}_{\mathbb {X},{\mathcal {I}}}: \mathbb {X}_d \rightarrow \mathbb {Z}^q\) by \({\text {proj}}_{\mathbb {X},\mathcal {I}} \left( (x_{{\mathcal {R}}},x_{{\mathcal {I}}}) \right) = x_{{\mathcal {I}}}\) (with a similar definition for \(y_{{\mathcal {I}}}\) projection \({{\,\textrm{proj}\,}}_{\mathbb {Y},{\mathcal {I}}}\)). As the first-stage consensus variable \(z\) components should match those for each \(x_s\), due to the non-anticipativity constraints \(z-x_s=0\), \(s \in S\), the same first-stage distinction between real and integer components \(z:=(z_{{\mathcal {R}}},z_{{\mathcal {I}}})\) apply. Corresponding distinctions for the second-stage consensus \(w_s:=(w_{s,{\mathcal {R}}},w_{s,{\mathcal {I}}})\), \(s \in S\), apply as well. Note that \(z\) is not explicitly constrained to lie within the discrete feasible set X. Nor are \(w_s\), \(s \in S\), explicitly constrained to lie withing \(Y_s(x_s)\) or \(Y_s(z)\). Thus, strictly speaking, \(z\in \mathbb {X}\) and \(w_s \in \mathbb {Y}\) vary freely within their respective spaces. Denote \(\varvec{w}:=(w_1,\dots ,w_{\vert S \vert })\) and similarly for \((\varvec{x},\varvec{y})\) and when needed we denote \(\varvec{z}:= (z,\dots ,z)\in \mathbb {R}^{n \times \vert S \vert }\).

Since the second-stage non-anticipativity variables are independent for each outcome scenario and otherwise unconstrained, the non-anticipativity constraints \(w_s - y_s = 0\), \(x \in S\), have no practical effect on the feasibility of the second-stage decisions \(y_s\) in SMIP (1). Nevertheless, this formulation aids the subsequent analysis by allowing the incorporation of all variables (regardless of stage) into regularisation terms in a symmetric fashion. The second stage feasibility is propagated from \(\varvec{y}\) to \(\varvec{w}\) via this constraint, while \(\varvec{w}\) remains unconstrained. The formulation (1) is also conducive to generalising our results for two-stage SMIPs to multi-stage problems in which all stages except the last have active non-anticipativity constraints. In the practical application of developed algorithms to two-stage problems, the use of \(\varvec{w}\) may be suppressed, as it is in the description of the computational experiments of Sect. 6.

Throughout our developments, we assume the following assumptions to hold regarding our SMIP (1). We explicitly assume the existence of an optimal solution, which could be replaced by the standard assumption of rationality of the data defining the problem.

Assumption 1

We make the following standard SMIP assumptions:

  1. 1.

    Stochasticity of \(p_s\): for each \(s \in S\), we have \(p_s > 0\) and \(\sum _{s \in S} p_s = 1\).

  2. 2.

    Non-emptiness: \(K_s\), \(s \in S\), is a non-empty set of feasible decisions constructed with linear constraints and integrality constraints on the \(x_s\) and \(y_s\) variables. (This also implies that \(K_s\) is closed.)

  3. 3.

    Boundedness and Optimality: The optimal value of the SMIP (1) is bounded from below. Also, the feasible sets \(K_s\), \(s \in S\), are bounded. Furthermore, \(\zeta ^{SMIP}\) is feasible and possesses an optimal solution.

  4. 4.

    Relatively complete recourse: The SMIP model has relatively complete recourse; \(\forall x \in X,\, \forall s \in S\), we have \(Y_s(x) \ne \emptyset \): that is, first-stage decisions \(x\) that satisfy the first-stage specific constraints \(x\in X\) have at least one second-stage decision solution \(\left( y_s\right) _{s \in S}\) for which \((x,y_s) \in K_s\) for all scenarios \(s \in S\).

Of interest is the dual function \(\zeta :\Lambda \times \mathbb {R}_{>0}^{|S |} \times \mathbb {R}_{>0} \rightarrow \mathbb {R}_{+\infty }\) defined by

$$\begin{aligned} \zeta (\lambda ,\pi ,\rho ) := \min \{\varphi ^{\lambda ,\rho ,\pi }\left( z, \varvec{w}\right) \mid (z,\varvec{w}) \in {\mathbb {X}}\times \mathbb {Y}^{\small {\vert S \vert }}\}, \end{aligned}$$
(2)

where

$$\begin{aligned} \varphi ^{\lambda ,\rho ,\pi }\left( z, \varvec{w}\right)&:=\sum _{s\in S} \varphi _s^{\lambda ,\rho ,\pi }{(z, w_s)} \text { and } \Phi ^{\lambda ,\rho ,\pi }(z,\varvec{w}) := \prod _{s \in S} \Phi _s^{\lambda ,\rho ,\pi }(z,w_s) \end{aligned}$$
(3a)

with, for each \(s \in S\),

$$\begin{aligned} \varphi _s^{\lambda ,\rho ,\pi }\left( z, w_s \right)&:= \min _{\left( x_{s},y_{s}\right) \in K_s } f_s(x_s,y_s) - \lambda _s^\top \, (z-x_s) + \rho \pi _s \psi \left( z-x_{s},w_{s}-y_{s} \right) \end{aligned}$$
(3b)
$$\begin{aligned} \Phi _s^{\lambda ,\rho ,\pi }\left( z, w_s \right)&:= \mathop {{{\,\textrm{argmin}\,}}}\limits _{\left( x_{s},y_{s}\right) \in K_s } f_s(x_s,y_s) - \lambda _s^\top \, (z-x_s) + \rho \pi _s \psi \left( z-x_{s},w_{s}-y_{s} \right) . \end{aligned}$$
(3c)

We assume that \(K_s\) is defined as in (1d), and that the usual dual feasibility \(\lambda \in \Lambda :=\{\lambda \mid \sum _{s \in S} \lambda _s = 0\}\) holds. For each scenario \(s \in S\), the penalty function \(\psi \) output value is scaled by a penalty scaling parameter \(\rho > 0\), and scenario \(s \in S\) specific penalty weighting parameters \(\pi _s >0 \) (for which \(\sum _{s \in S} \pi _s =1\)) to be specified. Note that under the assumption that \(\lambda \in \Lambda \), the summation \(\sum _{s \in S} \lambda _s z\) conveniently vanishes and so these terms may be dropped in subsequent developments.

Each instance of problem (2) is a continuous optimisation problem over the space \({\mathbb {X}}\times \mathbb {Y}^{\small {\vert S \vert }}\), and for nontrivial instances of SMIP (1), \(\varphi ^{\lambda ,\rho ,\pi }\) is nonconvex with multiple isolated local minima. Under assumptions in [15, 16], we have \(\zeta ^{SMIP} = \zeta (\lambda ,\pi ,\rho )\) for sufficiently large but finite \(\rho \). Properties of locally optimal solutions to the minimisation of \(\varphi ^{\lambda ,\rho ,\pi }\), and how these local minimisers relate to the solutions to the original SMIP (1), are of special interest in this paper’s subsequent analysis.

As mentioned earlier, the nonsmoothness of penalty functions \(\psi \) that support the exact penalty properties discussed in [15, 16] prevents the support of convergence theory provided by Gauss–Seidel approaches. For this reason, we modify the properties assumed in [15, 16] for the penalty function \(\psi \) to the conditions stated in Assumption 2. In particular, we assume that the penalty is strongly convex and differentiable from the outset (departing markedly from [15, 16]), as this is required for a Gauss–Seidel approach to be applied with desirable convergence properties (see Lemma 21).

Assumption 2

For our smooth penalty function \(\psi : \mathbb {X}\times \mathbb {Y}\rightarrow \mathbb {R}\), we make the following integer compatible regularisation function (ICRF) assumptions:

  1. 1.

    \(\psi \left( u,v\right) \ge 0\) for all \((u,v)\) and \((u,v) = 0 \) if and only if \(\psi \left( u, v\right) =0\).

  2. 2.

    If \(\gamma \in [0,1)\) then \(\psi \left( \gamma u, v\right) < \psi \left( u, v\right) \) for all \(u\ne 0\) and \(\psi \left( u, \gamma v\right) < \psi \left( u, v\right) \) for all \(v\ne 0\).

  3. 3.

    Strong convexity holds with modulus \(m > 0\) i.e.

    $$\begin{aligned} \psi (u,v) \ge \psi (u^0,v^0) + \left\langle \nabla \psi (u^0,v^0), \left[ \begin{array}{c}u- u^0 \\ v-v^0 \end{array}\right] \right\rangle + \frac{m}{2} \left\| \left[ \begin{array}{c}u- u^0 \\ v-v^0 \end{array}\right] \right\| ^2. \end{aligned}$$
    (4)

We note that Assumption 2 implies \((0,0) = \nabla \psi (0,0)\) and thus (4) implies

$$\begin{aligned} \psi (u,v) \ge \frac{m}{2} \Vert (u,v)\Vert ^2, \;\text {for all discrepancies}\;(u,v). \end{aligned}$$
(5)

Remark 1

In the theoretical development, we partition the discrepancies into \(u\) and \(v\) components to correspond to the special treatment of early-stage variables against late-stage variables. For a two-stage problem, \(u\) corresponds to first-stage discrepancies, and \(v\) corresponds to second-stage discrepancies. To allow for versatility in how the theoretical development informs algorithmic approaches, especially for application to multi-stage problems, we carry the development with the distinction between \(u\) and \(v\) discrepancies through Sect. 5.

Remark 2

In our computational developments in Sect. 6, we use a weighted squared 2-norm penalty function \(\psi (u,v) = \frac{1}{2} \left( \sum _{i=1}^n \left[ \bar{\mu }_iu_i^2\right] + \Vert v\Vert ^2\right) \) with weights \(\bar{\mu }_i>0\), \(i=1,\dots ,n\). In general, the strong convexity with modulus m is equivalent to the convexity of the function \((u,v) \mapsto \psi (u,v) - \frac{m}{2} \Vert (u,v) \Vert ^2\). For the algorithmic manifestation as presented in Sect. 6, \(v\) may furthermore be set identically to value zero.

2.1 Preliminary application of Gauss–Seidel iterations

We define the following notation based on the assumption that the Lagrange multipliers \(\lambda ^n\in \Lambda \) and the penalty parameters \(\rho ^n>0\), \(\pi ^n> 0\), \(\sum _{s \in S} \pi _s^n = 1\), vary with each iteration \(n\ge 0\):

$$\begin{aligned} \varphi ^n(z,\varvec{w}) := \sum _{s \in S} \varphi _s^n(z,w_s) \quad \text {where}\quad \varphi _s^n(z, w_s) := \varphi ^{\lambda ^n , \rho ^n, \pi ^n }_s(z, w_s). \end{aligned}$$
(6)

One iterative solution approach for finding locally optimal solutions for SMIP (1) starting with initial \(z^0 \in {\mathbb {X}}\) is based on Gauss–Seidel (GS) iterations \(n\ge 0\) of the form

$$\begin{aligned} w_s^{n+1}&\leftarrow \mathop {{{\,\textrm{argmin}\,}}}\limits _{w\in \mathbb {Y}} \varphi _s^n\left( z^{n}, w\right) \quad \text {for all}\quad s \in S, \end{aligned}$$
(7a)
$$\begin{aligned} z^{n+1}&\leftarrow \mathop {{{\,\textrm{argmin}\,}}}\limits _{z\in {\mathbb {X}}} \varphi ^n\left( z, \varvec{w}^{n+1} \right) . \end{aligned}$$
(7b)

The \(z\) update (7b) is not easily computable, but the \(w\) update (7a) is so, as demonstrated in the following proposition.

Proposition 3

Let \((z,w) \in \mathbb {X}\times \mathbb {Y}\).

  1. 1.

    For each \(s \in S\), \(w\in {{\,\textrm{argmin}\,}}_{w'} \varphi _s^{\lambda ,\rho ,\pi }( z,w' )\) implies \(w\in {{\,\textrm{proj}\,}}_{\mathbb {Y}}\left( \Phi _s^{\lambda ,\rho ,\pi }(z,w)\right) \).

  2. 2.

    Moreover, given \(z^n\), \(\varvec{w}^{n+1} \in {{\,\textrm{argmin}\,}}_{w'} \varphi ^n( z^n,\varvec{w}' )\) may be computed by solving for each \(s \in S\)

    $$\begin{aligned} \hspace{-0.7cm}(x_s^{n+1},y_s^{n+1}){} & {} \in \arg \min _{ (x_{s},y_{s}) \in K_s } f_s(x_s,y_s) +(\lambda _s^n)^\top x_s +\rho ^n\, \pi _s^n\psi \left( z^{n}-x_s,0\right) \end{aligned}$$
    (8)

    and then setting \(\varvec{w}^{n+1}=\varvec{y}^{n+1}\).

Proof

We show the contrapositive. Assume for some \(s \in S\) that \(w\in {{\,\textrm{argmin}\,}}_{w'} \varphi _s^{\lambda ,\rho ,\pi }( z,w' )\), but that \(w\notin {{\,\textrm{proj}\,}}_{\mathbb {Y}} \left( \Phi _s^{\lambda ,\rho ,\pi }(z,w)\right) \). Then as \((x_s,y_s) \in \Phi _s^{\lambda ,\rho ,\pi }(z,w)\) with \(w_s \ne y_s\), we have

$$\begin{aligned} \varphi _s^{\lambda ,\rho ,\pi }\left( z, w_s \right)&= f_s(x_s ,y_s ) + (\lambda _s )^\top x_s+ \rho \pi _s \psi \left( z- x_s , w_{s} - y_{s} \right) \\&> f_s(x_{s} ,y_{s} ) + (\lambda _s )^\top x_s + \rho \pi _s \psi \left( z- x_s , 0 \right) \quad \text {(due to Assumption}~(2))\\&= f_s(x_{s} ,y_{s} ) + (\lambda _s )^\top x_s +\rho \pi _s \psi \left( z- x_s , y_s - y_s \right) \\&\ge \min _{\left( x_{s}',y_{s}'\right) \in K_s} \left\{ f_s(x_{s}',y_{s}') + (\lambda _s )^\top x_s' + \rho \pi _s \psi \left( z- x_s' , y_s - y_s' \right) \right\} \\&=\varphi _s^{\lambda ,\rho ,\pi }{\left( z, y_s \right) } \end{aligned}$$

which implies the contradiction that \(w_s \notin {{\,\textrm{argmin}\,}}_{w} \varphi _s^{\lambda ,\rho ,\pi }\left( z, w_s \right) \). To show the claim of Part 2, assume that \(\varvec{w}^{n+1}\) computed from (8) does not satisfy \(w_s^{n+1} \in {{\,\textrm{argmin}\,}}_{w'} \varphi _s^n( z^n,w' )\) for at least one \(s \in S\). Let \(\acute{w}_s^{n+1} \in {{\,\textrm{argmin}\,}}_{w'} \varphi _s^n( z^n,w' )\). By Part 1, there exists \(\acute{x}_s^{n+1}\) with \((\acute{x}_s^{n+1}, \acute{w}_s^{n+1}) \in \Phi _s^n(z^n,\acute{w}_s^{n+1})\) such that

$$\begin{aligned}&f_s(\acute{x}_s^{n+1},y_s^{n+1}) +(\lambda _s^n)^\top \acute{x}_s^{n+1} +\rho ^n\, \pi _s^n\psi \left( z^{n}-\acute{x}_s^{n+1},0\right) \\&\quad <f_s({x}_s^{n+1},y_s^{n+1}) +(\lambda _s^n)^\top {x}_s^{n+1} +\rho ^n\, \pi _s^n\psi \left( z^{n}-{x}_s^{n+1},0\right) , \end{aligned}$$

which would contradict the optimality in (8). \(\square \)

Computing the update \(z^{n+1} \in \arg \min _{z} \varphi ^n(z,\varvec{w}^{n+1})\) given fixed \(\varvec{w}^{n+1}\) corresponds to an infimal convolution of \((x_s,y_s) \mapsto f_s(x_s,y_s) + \delta _{K_s} (x_s,y_s) + (\lambda _s^{n})^\top x_s\) and \((u,v) \mapsto \rho ^n\,\pi _s^{n} \psi (u,v)\), for each \(s \in S\), where we denote the indicator function of a set \(K_s\) by \(\delta _{K_s} (x,y)\) that takes the value zero if \((x,y) \in K_s\) and \(+\infty \) otherwise. The infimal convolution is well-studied [13, Chapter 1, section H], and later we make use of certain convex “cross-sections" of this infimal convolution. However, the calculation culminating in \(z^{n+1} \in \arg \min _{z} \varphi ^n(z,\varvec{w}^{n+1})\) is still not easily computable, as it requires the solution of a MIP of comparable difficulty to the original SMIP (1). Nevertheless, this problem \(z^{n+1} \in \arg \min _{z} \varphi ^n(z,\varvec{w}^{n+1})\) is useful from a theoretical standpoint, as it links the consensus problem to the Gauss–Seidel step of the continuous regularisation.

A more practical approach to the \(z\) update takes the form of descent steps using the usual consensus update, i.e.

$$\begin{aligned} z^{n+1}\in \arg \min _{z\in {\mathbb {X}}}\sum _{s\in S} \pi _{s}^n\psi \left( z-x_{s}^{n+1},0\right) , \end{aligned}$$
(9)

where \(\varvec{w}^{n+1}-\varvec{y}^{n+1}=0\) follows from Proposition 3. From Assumption 2(3) with \(u_s^0 = z^{n+1}-x_s^{n+1}\), \(u_s= z-x_s^{n+1}\) and \(v_s^0=v_s=0\) for all \(s \in S\), and the optimality condition associated with the \(z^{n+1}\) update \(\sum _{s\in S}\pi _{s}^n\nabla _z\psi (z^{n+1} - x_s^{n+1},0) =0\) we have that

$$\begin{aligned} \sum _{s\in S}\pi _{s}^n\psi \left( z-x_{s}^{n+1},0\right) \ge \sum _{s\in S}\pi _{s}^n\psi \left( z^{n+1}-x_{s}^{n+1},0\right) + \frac{m}{2} \Vert z - z^{n+1} \Vert ^2 \end{aligned}$$
(10)

and so the \(z^{n+1}\) update (9) must be unique.

Using this observation, we can devise a Gauss–Seidel algorithm that is guaranteed to produce non-ascent steps while the stabilisation \((z^{n}, \varvec{w}^{n+1})=(z^{n+1}, \varvec{w}^{n+1} )\) is not achieved, which is given in Algorithm 1.

Algorithm 1
figure a

Modified block GS method for SIP

Algorithm 1 describes a two-block Gauss–Seidel iterative approach on the two blocks \((x_s,y_s,w_s)_{s \in S}\) and \(z\), where the mixed-integer constraints only appear in the block \((x_s,y_s,w_s)_{s \in S}\) subproblem implicitly referred to in Lines 45 of Algorithm 1. In the following sections, we analyse the convergence properties of certain embedded subsequences of (mid-)iterations \((x^{n_k+1},y^{n_k+1},w^{n_k+1},z^{n_k})\) generated by Algorithm 1 for penalty coefficient values \(\rho ^n> 0\), penalty weights \(\pi _s^n\), \(s \in S\), and Lagrangian multipliers \(\lambda _s^n\), \(s \in S\), that vary with iteration \(n\ge 1\). (It is convenient to maintain that \(\sum _{s \in S} \lambda _s^n= 0\) for all \(s \in S\) and all \(n\ge 0\).) We must also assume the solution \((x^{n_k+1},y^{n_k+1},w^{n_k+1})\) is globally optimal in order to carry out our convergence analysis in Sect. 5.1. Here we assume the existence of Fréchet subdifferentials at the minimising points, and this is assured for any global minima. Furthermore, when \(\psi \) is a quadratic form, a global minimum may be computed in practice using a MIQP solver.

Remark 3

The classical progressive hedging algorithm is realised by taking \(\pi _s^n = p_s\) and so \(\rho ^n = \sum _{s \in S} p_s \rho ^n\) (as \(\sum _{s \in S} p_s^n =1 \)). Then for \(\psi (\cdot ,0) = \frac{1}{2} \Vert \cdot \Vert ^2\) we have \(z^{n+1} = \sum _{s \in S} p_s x_{s}^{n+1}\) and assuming dual feasibility \(\sum _{s \in S} \lambda _s^n = 0\) one can also assert for all \(s \in S\) that

$$\begin{aligned} \left( x_{s}^{n+1},y_{s}^{n+1}\right) \in \Phi _s^n(z^n,w^{n+1}) {=} \mathop {{{\,\textrm{argmin}\,}}}\limits _{(x_{s},y_{s}) \in K_s } f_s(x_s,y_s){+}(\lambda _s^{n})^\top x_s {+} \rho ^np_s \psi \left( z^{n}-x_{s}, 0\right) \end{aligned}$$

along with the multiplier update that retains dual feasibility of the multipliers i.e. \(\lambda _s^{n+1} = \lambda _s^n - p_s \rho ^n (z^{n+1} - x_s^{n+1})\). Moreover, the dual feasibility allows one to assert that the same \(z^{n+1}\) solves the minimisation with respect to \(z\) in the full augmented Lagrangian. Penalties \(\rho ^n\) between iterations with progressive hedging are usually left unchanged or are updated in such a manner as to realise stabilisation. (See, e.g., [12, Section 3.4.1].)

Next, we build on that development where Algorithm 1 is viewed as an approximate two-block GS iterative approach within the continuous optimisation framework of successively minimising \(\varphi ^n\) in \(z\) (approximately) and \(w\) (globally and exactly) with Lagrange multipliers \(\lambda ^n\in \Lambda \), penalty coefficient values \(\rho ^n> 0\) and penalty weights \(\pi _s^n\), \(s \in S\), varying between iterations \(n\ge 0\) under certain assumptions.

We conclude this section by noting that the above algorithm is essentially that of [11], with alterations to the Lagrangian multiplier and penalty parameter updates. In particular, we consider what happens when Lagrange multipliers \(\lambda ^n\in \Lambda \) and penalty weights \(\pi ^n\) stop changing after a finite number of iterations, while penalty parameters \(\{\rho ^n\}\) may increase without bound. The latter feature requires us to consider the limiting behaviour of the regularisation \(\varphi ^n\) as \(\rho ^n\rightarrow \infty \). Such an analysis is facilitated by analysing the level curves of the sequence of functions, denoted by \({\text {lev}}_c \varphi ^{\lambda ,\rho ,\pi }:=\) \(\{ (z,w) \mid \varphi ^{\lambda ,\rho ,\pi }(z,w ) \le c\}= \{ (z,w) \mid \frac{1}{\rho }\varphi ^{\lambda ,\rho ,\pi }(z,w ) \le \frac{c}{\rho }\}= {\text {lev}}_{ \frac{c}{\rho }}\varphi ^{\lambda ,\rho ,\pi }, \) prompting the use of epi-convergence as a tool in our analysis as this is associated with the convergence of level sets.

3 Properties of the SMIP regularisation \(\varphi ^{\lambda ,\rho ,\pi }\)

The continuous regularisation \(\varphi ^{\lambda ,\rho ,\pi }\) of SMIP (1) has properties that allow for feasible points of SMIP (1) to be associated with certain local minima of \(\varphi ^{\lambda ,\rho ,\pi }\). To gain insight into these properties of \(\varphi ^{\lambda ,\rho ,\pi }\), we first note some additional properties of \(\psi \) that follow from the properties listed in Assumption 2.

Proposition 4

Assume \(\psi \) satisfies Assumption 2. Then, for all \((z,\varvec{w}) \in {\mathbb {X}}\times \mathbb {Y}^{\small {\vert S \vert }}\), \(\rho > 0\), \(\pi _s > 0\), \(s \in S\), and \(\lambda \in \Lambda \), the following properties hold for each \(s \in S\):

  1. 1.

    The set of solutions for problem (3), that is \(\Phi _s^{\lambda ,\rho ,\pi }(z,w)\), is non-empty.

  2. 2.

    The function \(\rho \mapsto \varphi _s^{\lambda ,\rho ,\pi }\left( z, w_s \right) \) is non-decreasing.

  3. 3.

    If in addition \((z,w_s) \in K_s\), then \(\varphi _s^{\lambda ,\rho ,\pi }\left( z, w_s \right) \le f_s\left( z,w_s \right) . \) If \((\varvec{z},\varvec{w}) \in K:= \Pi _{s \in S} K_s\) (with \(\varvec{z}:=(z,z,\dots ,z)\)) then

    $$\begin{aligned} \varphi ^{\lambda ,\rho ,\pi }\left( z,\varvec{w}\right) = \sum _{s\in S} \varphi _s^{\lambda ,\rho ,\pi }{(z,w_s)} \le \sum _{s\in S} f_s\left( z,w_s \right) < +\infty . \end{aligned}$$
    (11)
  4. 4.

    The function \((z,w_s) \mapsto \varphi _s^{\lambda ,\rho ,\pi }(z,w_s)\) is locally Lipschitz continuous over

    $$\begin{aligned} K_s^\Delta :=\{ (z,w_s) \in {\mathbb {X}}\times {\mathbb {Y}}_s \; : \; (z,w_s) \in {{\,\textrm{conv}\,}}(K_s) + B_\Delta (0,0) \}, \end{aligned}$$
    (12)

    with modulus \(\pi _s \rho L_s^\Delta \), where \(L_s^\Delta \) depends on the diameter of \({{\,\textrm{conv}\,}}( K_s ) + B_\Delta (0,0)\). Taking \(L^\Delta := \max \{ L_s^\Delta \}\), we also have \(\varphi ^{\lambda ,\rho ,\pi }\) is Lipschitz continuous with modulus \(\rho L^\Delta \).

Proof

See Appendix A. \(\square \)

Definition 1

Denote \({{\,\textrm{proj}\,}}_{{\mathcal {I}}}(\cdot ):= {\text {proj}}_{\mathbb {X},\mathcal {I}} \left( \cdot \right) \times {\text {proj}}_{\mathbb {Y},\mathcal {I}} \left( \cdot \right) \), integer-component projection and, for each \((\bar{\varvec{x}}_{\mathcal {I}},\bar{\varvec{y}}_{\mathcal {I}}) \in {{{\,\textrm{proj}\,}}}_{{\mathcal {I}}}(K):= \Pi _{s \in S} {{{\,\textrm{proj}\,}}}_{{\mathcal {I}}}(K_s)\), denote

$$\begin{aligned} K_s^{(\bar{\varvec{x}}_{\mathcal {I}},\bar{\varvec{y}}_{\mathcal {I}})}:= \{(x_s,y_s) \in K_s \mid (\bar{\varvec{x}}_{\mathcal {I}},\bar{\varvec{y}}_{\mathcal {I}}) = {{\,\textrm{proj}\,}}{}_{{\mathcal {I}}} (\varvec{x}, \varvec{y}) \}. \end{aligned}$$

Note that this corresponds to a polyhedral subset of \(K_s\) once we have removed the integrality constraint by fixing the integer variables at a specific integer value. We now consider the behaviour of \(\varphi ^{\lambda ,\rho ,\pi }\) within neighbourhoods having progressively additional structure imposed. In preparation thereof, we introduce notation that facilitates the view of \(\varphi ^{\lambda ,\rho ,\pi }\) in terms of its finitely numbered specific cross-sections over \((\varvec{x}_{\mathcal {I}},\varvec{y}_{\mathcal {I}}) \in {{\,\textrm{proj}\,}}_{{\mathcal {I}}}(K)\).

Definition 1 induces the following notation for proximal cross-sections for each \((\varvec{x}_{{\mathcal {I}}}^0,\varvec{y}_{{\mathcal {I}}}^0) \in \text {proj}_{{\mathcal {I}}}(K)\).

Definition 2

For \((\lambda , \rho , \pi ) \in \Lambda \times \mathbb {R}_{>0} \times \mathbb {R}_{>0}^{|S |} \), the proximal cross-sectional values are defined by

$$\begin{aligned} \varphi _s^{\lambda ,\rho ,\pi }\left( z, w_s \mid \varvec{x}_{{\mathcal {I}}}^0,\varvec{y}_{{\mathcal {I}}}^0\right) :=&\inf _{(x_s,y_s)} \{ {f_s(x_s,y_s) + \lambda _s^\top x_s} + \delta _{K_s^{ ({\textbf{x}}^{0}_{{\mathcal {I}}},{\textbf{y}}^{0}_{{\mathcal {I}}})} } (x_s,y_s) \nonumber \\&+ \rho \pi _s\psi (z-x_s,w_s-y_s) \}. \end{aligned}$$
(13a)
$$\begin{aligned} \varphi ^{\lambda ,\rho ,\pi }\left( z, \varvec{w} \mid \varvec{x}_{\mathcal {I}}^0,\varvec{y}_{\mathcal {I}}^0\right)&:= \sum _{s\in S} \varphi _s^{\lambda ,\rho ,\pi }\left( z, w_s \mid \varvec{x}_{{\mathcal {I}}}^0,\varvec{y}_{{\mathcal {I}}}^0\right) , \end{aligned}$$
(13b)

and the the set of arguments realising the proximal cross-sectional values are defined by

$$\begin{aligned} \Phi _s^{\lambda ,\rho ,\pi }\left( z, w_s \mid \varvec{x}_{{\mathcal {I}}}^0,\varvec{y}_{{\mathcal {I}}}^0\right)&:= \mathop {{{\,\textrm{argmin}\,}}}\limits _{x,y}{f_s(x_s,y_s) + \lambda _s^\top x_s} +\delta _{K_s^{ ({\textbf{x}}^{0}_{{\mathcal {I}}},{\textbf{y}}^{0}_{{\mathcal {I}}})} } (x_s,y_s) \nonumber \\&\left. \quad + \rho \pi _s\psi (z-x_s,w_s-y_s) \right\} . \end{aligned}$$
(14a)
$$\begin{aligned} \Phi ^{\lambda ,\rho ,\pi }\left( z, \varvec{w} \mid \varvec{x}_{\mathcal {I}}^0,\varvec{y}_{\mathcal {I}}^0\right)&:=\prod _{s\in S} \Phi _s^{\lambda ,\rho ,\pi }\left( z, w_s \mid \varvec{x}_{{\mathcal {I}}}^0,\varvec{y}_{{\mathcal {I}}}^0\right) . \end{aligned}$$
(14b)

For each \(s \in S\) and \((x_{s,{\mathcal {I}}},y_{s,{\mathcal {I}}}) \in {{\,\textrm{proj}\,}}_{\mathcal {I}}(K_s)\), the properties of Assumption 2 for \(\psi \) allow for the following properties of the cross-sections to be established.

Lemma 5

For \((x_{{\mathcal {I}}}^0,y_{{\mathcal {I}}}^0) \in {{\,\textrm{proj}\,}}_{{\mathcal {I}}}(K)\) the mapping \((z,w_s) \mapsto \varphi _s^{\lambda ,\rho ,\pi }\left( z,w_s \mid \varvec{x}_{{\mathcal {I}}}^0,\varvec{y}_{{\mathcal {I}}}^0\right) \) is convex over \(\mathbb {R}^{n}\times \mathbb {R}^{m}\) for each \(s \in S\).

Proof

This function can be represented as the infimal convolution of two closed, convex functions

$$\begin{aligned} \left( x_s,y_s\right)\mapsto & {} f_s (x_s,y_s) +\lambda _s^\top x_s + \delta _{K_s^{ ({\textbf{x}}^{0}_{{\mathcal {I}}},{\textbf{y}}^{0}_{{\mathcal {I}}})} } (x_s,y_s) \nonumber \\ \left( u_s,v_s \right)\mapsto & {} \rho \pi _s \psi \left( u_{s},v_{s}\right) \end{aligned}$$

with \((z,w_s) = (x_s,y_s) + (u_s,v_s)\). The compactness of \(K_s\) ensures that of \(K_s^{ ({\textbf{x}}^{0}_{{\mathcal {I}}},{\textbf{y}}^{0}_{{\mathcal {I}}})} \), which in turn ensures that the infimal convolution is bounded away from \(-\infty \). As the strict epi-graph of an infimal convolution equals the sum of strict epi-graphs of the constituent functions, convexity follows [13, Exercise 1.28]. \(\square \)

Note that \(\varphi _s^{\lambda ,\rho ,\pi }(z,w_s) = \min _{(x_{s,{\mathcal {I}}},y_{s,{\mathcal {I}}}) \in {{\,\textrm{proj}\,}}_{\mathcal {I}}(K_s)} \varphi _s^{\lambda ,\rho ,\pi }\left( z,w_s \mid x_{s,{\mathcal {I}}},y_{s,{\mathcal {I}}}\right) \), for each \(s \in S\), is a minimum of a finite number of convex functions, but \(\varphi ^{\lambda ,\rho ,\pi }\) itself is not guaranteed to be convex or differentiable on its entire domain \(\mathbb {X}\times \mathbb {Y}^{\small {\vert S \vert }}\). Nevertheless, \(\varphi _s^{\lambda ,\rho ,\pi }\) is locally convex and differentiable on open neighbourhoods N where, for all \((z,w_s) \in N\), \(\varphi _s^{\lambda ,\rho ,\pi }(z,w_s)= \varphi _s^{\lambda ,\rho ,\pi }\left( z,w_s \mid x_{s,{\mathcal {I}}},y_{s,{\mathcal {I}}}\right) \) holds for exactly one \((x_{s,{\mathcal {I}}},y_{s,{\mathcal {I}}}) \in {{\,\textrm{proj}\,}}_{\mathcal {I}}(K_s)\).

Lemma 6

Assume \(\psi \) satisfies the Assumption 2, with parameter m as in Assumption 2(3). For each fixed \(D > 0\), there exists a \(\tilde{\delta }>0\) such that if a discrepancy \((u^0,v^0)\) satisfies \(\Vert (u^0,v^0)\Vert < \tilde{\delta }\), then \(\psi (u^0,v^0) < \psi (u,v)\) for all discrepancies \((u,v)\) satisfying \(\Vert (u-u^0,v-v^0)\Vert > D\).

Proof

See Appendix A. \(\square \)

For \(\psi \) defined by \(\psi (u,v) = \frac{1}{2}\left\| (u,v) \right\| ^2\) and any fixed \(D > 0\), we may identify \(\tilde{\delta } = \frac{1}{2} D\) (since \(m=1\) and \(\nabla \psi (u,v) = (u,v)\)), so that if \(D=1/2\) for example, we have that for all \((u^0,v^0)\) such that \(\Vert (u^0,v^0)\Vert < \frac{1}{4}\), then \(\psi (u^0,v^0) < \psi (u,v)\) for all \((u,v)\) such that \(\Vert (u-u^0,v-v^0\Vert > \frac{1}{2}\). This observation will have practical value in terms of separating values for different cross-sections of \(\varphi \).

Proposition 7

Assume \(\psi \) satisfies Assumption 2 and \((z^{0},w_{s}^{0}) \in K_{s}\) for all \(s \in S\). If there is at least one scenario \(s \in S\) such that \((x_{s,{\mathcal {I}}},y_{s,{\mathcal {I}}})\in {{\,\textrm{proj}\,}}_{\mathcal {I}}(K_{s})\) with \((z_{{\mathcal {I}}}^0,w_{s,{\mathcal {I}}}^0) \ne (x_{s,{\mathcal {I}}},y_{s,{\mathcal {I}}})\), then there exists a finite threshold penalty coefficient \(\tilde{\rho }>0\) and a threshold \(\tilde{\delta }>0\) such that for all \(\rho >\tilde{\rho }\) and \(0<\delta <\tilde{\delta }\), the strict inequality \(\varphi _s^{\lambda ,\rho ,\pi }(z,w_{s})<\varphi _s^{\lambda ,\rho ,\pi }\left( z,w_{s} \mid x_{s,{\mathcal {I}}},y_{s,{\mathcal {I}}}\right) \) hold over \((z,w_{s})\in B_{\delta }(z^{0},w_{s}^{0})\).

Proof

Assuming for some \(s\in S\) that we have \((z_{{\mathcal {I}}}^0,w_{s,{\mathcal {I}}}^0) \ne (x_{s,{\mathcal {I}}},y_{s,{\mathcal {I}}})\), then identifying \((u_s,v_s)=(z-x_s,w_s-y_s)\) and \((u_s^0,v_s^0)=(z-z^0,w_s-w_s^0)\) for each \(s \in S\), we have \((u_s-u_s^0,v_s-v_s^0) = (z^0-x_s,w_s^0-y_s)\) and \(\left\| (u_s-u_s^0,v_s-v_s^0) \right\| > \frac{1}{2}\) for all \((x_s,y_s) \in K_s^{(x_{\mathcal {I}},y_{\mathcal {I}})}\). Thus, using Lemma 6 with \(D = \frac{1}{2}\), we have a \(\tilde{\delta } > 0\) for which \((z,w_s)\) satisfying \(\left\| (z-z^0,w_s-w_s^0) \right\| < \tilde{\delta }\) implies that

$$\begin{aligned} \psi (z-z^0,w_s-w_s^0) < \psi (z-x_s, w_s-y_s) \end{aligned}$$
(15)

for all \((x_s,y_s) \in K_s^{(x_{\mathcal {I}},y_{\mathcal {I}})}\) given that \((z_{{\mathcal {I}}}^0,w_{s,{\mathcal {I}}}^0) \ne (x_{s,{\mathcal {I}}},y_{s,{\mathcal {I}}})\).

Due to the compactness of \(K_s^{(x_{\mathcal {I}},y_{\mathcal {I}})}\), defining for fixed \((z^0,w_s^0)\), \(s \in S\),

$$\begin{aligned} \Delta _s(z,w_s):=\min _{(x,y)}\left\{ \psi (z-x_s, w_s-y_s)-\psi (z-z^0,w_s-w_s^0) \mid (x_s,y_s) \in K_s^{(x_{\mathcal {I}},y_{\mathcal {I}})} \right\} , \end{aligned}$$

we have \(\Delta _s(z,w_s) > 0\). It follows for each \(s \in S':=\left\{ s \mid (z_{{\mathcal {I}}}^0,w_{s,{\mathcal {I}}}^0) \ne (x_{s,{\mathcal {I}}},y_{s,{\mathcal {I}}})\right\} \) that \(\psi (z-z^0,w_s-w_s^0) + \frac{\Delta _s(z,w_s)}{2} < \psi (z-x_s, w_s-y_s)\) for all \((x_s,y_s) \in K_s^{(x_{\mathcal {I}},y_{\mathcal {I}})}\).

Again due to the compactness of \(K_s^{(x_{\mathcal {I}},y_{\mathcal {I}})}\), we have still for each \(s \in S'\) that

$$\begin{aligned} \frac{f_s(z^0,w_s^0) + \lambda _s^\top z^0}{\rho }+ & {} \pi _s\psi (z-z^0,w_s-w_s^0) + \frac{\pi _s\Delta _s(z,w_s)}{2} \\{} & {} < \frac{f_s(x_s,y_s) + \lambda _s^\top x_s}{\rho } +\pi _s\psi (z-x_s, w_s-y_s) \end{aligned}$$

for all \((x_s,y_s) \in K_s^{(x_{\mathcal {I}},y_{\mathcal {I}})}\) when \(\rho> \tilde{\rho } > 0\) sufficiently large. Thus, for \(\left\| (z-z^0,w_s-w_s^0) \right\| <\tilde{\delta }\) and \(\rho> \tilde{\rho }>0\), we have

$$\begin{aligned} \varphi _s^{\lambda ,\rho ,\pi }(z,w_{s}) < \varphi _s^{\lambda ,\rho ,\pi }\left( z,w_{s} \mid \varvec{x}_{{\mathcal {I}}},\varvec{y}_{{\mathcal {I}}}\right) \end{aligned}$$
(16)

for all \(\rho >\tilde{\rho }\) and \(\left( z,w_s\right) \in B_{\tilde{\delta }}(z^{0},w_{s}^{0})\) whenever \(s \in S'\). (Otherwise, for \(s \in S \backslash S'\), \(\varphi _s^{\lambda ,\rho ,\pi }(z,w_{s}) = \varphi _s^{\lambda ,\rho ,\pi }\left( z,w_{s} \mid \varvec{x}_{{\mathcal {I}}},\varvec{y}_{{\mathcal {I}}}\right) \) holds.) Summing over \(s \in S\), the same holds then for \( \varphi ^{\lambda ,\rho ,\pi }(z,\varvec{w}) < \varphi ^{\lambda ,\rho ,\pi }\left( z,\varvec{w} \mid \varvec{x}_{{\mathcal {I}}},\varvec{y}_{{\mathcal {I}}}\right) . \) \(\square \)

We note that for each \((z,w_s) \in B_{\tilde{\delta }} (z^0, w^0)\), with \(s \in S'\), we have \(\Delta _s(z,w_s)> 0\) and so the gap between the left and right hand sides of the inequality in (16) can only grow with increasing \(\rho \). Recall that we seek elements of the set of feasible (non-anticipative) solutions is given by \(F:= \{ (z,\varvec{w}) \mid (z,w_s) \in K_s \text { for all } s \in S \}\), which is distinct from the set K. The next result follows immediately.

Corollary 8

Assume \(\psi \) satisfies Assumption 2. If \((z^{0},w^{0})\in F\), then there exists a \(\tilde{\rho }>0\) and a \(\tilde{\delta }>0\) such that for \(\rho \ge \tilde{\rho }\) and \(0<\delta <\tilde{\delta }\) we have

$$\begin{aligned} \varphi ^{\lambda ,\rho ,\pi }(z,\varvec{w})=\varphi ^{\lambda ,\rho ,\pi }\left( z,\varvec{w} \mid \varvec{z}_{{\mathcal {I}}}^{0},\varvec{w}_{{\mathcal {I}}}^{0}\right) \quad \text {for all }(z,\varvec{w})\in B_{\delta }(z^{0},\varvec{w}^{0}). \end{aligned}$$

Hence, for \((z,w_{s})\in B_{\delta }(z^{0},w_{s}^{0})\), \(s\in S\), with \(0<\delta <\tilde{\delta }\), the function \(\varphi _s^{\lambda ,\rho ,\pi }\) coincides with a convex function for all \(\rho \ge \tilde{\rho }\).

4 The theory of persistent local minima

For iteration indices \(n\ge 0\), let \((\lambda ^n,\pi ^n,\rho ^n)\in \Lambda \times \mathbb {R}_{>0}^{|S |} \times \mathbb {R}_{>0} \) and define for \((\lambda ^n,\pi ^n,\rho ^n) \rightarrow _{n\rightarrow \infty } (\lambda , \pi , \infty )\):

$$\begin{aligned} f^n(\varvec{x},\varvec{y},\varvec{w},z)&:= \sum _{s\in S} f_s(x_s,y_s) + (\lambda _s^n)^\top x_s+ \rho ^n\pi _s^n\psi (z- x_s , y_s - w_s), \end{aligned}$$
(17a)
$$\begin{aligned} \varphi _s^n&:=\varphi ^{\lambda ^n,\pi ^n,\rho ^n}; \, \varphi ^n:= \varphi ^{\lambda ^n,\pi ^n,\rho ^n}; \, \Phi _s^n:= \Phi _s^{\lambda ^n,\pi ^n,\rho ^n}; \, \Phi ^n:= \Phi ^{\lambda ^n,\pi ^n,\rho ^n}, \end{aligned}$$
(17b)
$$\begin{aligned} \Phi ^{\infty } \left( z, \varvec{w}\right)&:= \prod _{s \in S} \arg \min _{\left( x_{s},y_{s}\right) } \left\{ \pi _s \psi \left( z-x_{s},w_{s}-y_{s} \right) \mid (x_s,y_s) \in K_s \right\} . \end{aligned}$$
(17c)

In this section, we consider sequences \(\{(\widetilde{\varvec{x}}^n,\widetilde{\varvec{y}}^n,\widetilde{\varvec{w}}^n,\widetilde{z}^n)\}_{n=0}^\infty \) where we have \((\widetilde{\varvec{x}}^n, \widetilde{\varvec{y}}^n)\in \Phi ^n(\widetilde{z}^n,\widetilde{\varvec{w}}^n)\). Furthermore, we assume \(\lim _{n\rightarrow \infty } \rho ^n= \infty \) and we single out a specific class of sequences of local minima \(\{(\widetilde{z}^n,\widetilde{\varvec{w}}^n)\}_{n=0}^\infty \) for \(\varphi ^n\), which we call persistent, and which we show to be closely related to the feasible points of the underlying SMIP (1). We assume the following.

Assumption 9

Solution Sequence Assumptions (SSA) on \(\{(\widetilde{\varvec{x}}^n,\widetilde{\varvec{y}}^n,\widetilde{\varvec{w}}^n,\widetilde{z}^n)\}_{n=0}^\infty \) and \(\{(\lambda ^n,\pi ^n,\rho ^n)\}_{n=0}^\infty \), indexed by integers \(n \ge 0\):

  1. 1.

    Penalty coefficients are non-decreasing \(\rho ^{n+1} \ge \rho ^n> 0\) for \(n\ge 0\) and increase without bound \(\lim _{n\rightarrow \infty } \rho ^n= \infty \).

  2. 2.

    Dual feasibility \(\lambda ^n\in \Lambda \) is satisfied, and boundedness \(\limsup _{n\rightarrow \infty }\Vert \lambda ^n\Vert <\infty \) holds.

  3. 3.

    Each \(\widetilde{z}^n\) is a local minimum of the function \(z \mapsto \inf _w\varphi ^n(z, w)\).

  4. 4.

    The extracted sequence \(\{\widetilde{z}^n\}_{n=0}^\infty \) converges to \(\overline{z}\).

  5. 5.

    Each \(\widetilde{\varvec{w}}_s^n\), \(s \in S\), is globally optimal \(\widetilde{\varvec{w}}_s^n\in \arg \min _{w} \varphi _s^n(\widetilde{z}^n,w_s)\); thus \((\widetilde{\varvec{x}}^n,\widetilde{\varvec{y}}^n,\widetilde{\varvec{w}}^n) \in \arg \min _{\varvec{w}, (\varvec{x},\varvec{y})\in K} \sum _{s \in S} f^n_s(x_s,y_s,w_s,\widetilde{z}^n)\) is globally optimal with \(z=\widetilde{z}^n\) fixed.

We also consider the following assumption on penalty weights separately.

Assumption 10

Penalty Weight Assumptions (PWA): We assume that \(\sum _{s \in S} \pi _s^n= 1\) and \(\pi _s^n> 0\), \(s \in S\). We assume in addition that the applied update rule for generating penalty weights over iterations \(n\ge 0\) ensures that we have \(\pi _s^n \ge \xi \), for some fixed \(\xi > 0\), for all but a finite number of iterations \(n\ge 0\), and for each \(s \in S\) such that \(\widetilde{x}^n_{s,{\mathcal {I}}} \ne \widetilde{z}_{\mathcal {I}}^n\) holds infinitely often in n. Furthermore, for \(n\ge 0\) for which the set \(S_n:=\{s \in S \mid \widetilde{x}^n_{s,{\mathcal {I}}} \ne \widetilde{z}_{\mathcal {I}}^n \}\) is empty, we assume the penalty weight update rule also ensures that \(\pi _s^{n+1} = \pi _s^n\) for all \(s \in S\).

If \(S_n\) is empty for all but a finite number of iterations \(n\), then consensus \(\widetilde{\varvec{x}}_{s,{\mathcal {I}}}^n=\widetilde{z}_{\mathcal {I}}^n\) has been reached and the above assumption is trivially satisfied. When \(S_n= \emptyset \) occurs, then \(\widetilde{z}^n\in X\) and by relative complete recourse there exists \(y_s \in Y_s (\widetilde{z}^n)\) for all \(s \in S\) so that \((\widetilde{\varvec{z}}^n,\varvec{y})\) is feasible for SMIP (1).

In the context of Assumptions 9 and 10, we examine when it holds that \(\overline{z}\in X\). Under Assumptions 9 and 10, local minimisers \(\widetilde{z}^n\) of \(z\mapsto \inf _{\varvec{w}} \varphi ^n(z,\varvec{w})\) can be peculiar in the sense that \(\inf _{\varvec{w}} \varphi ^n(\widetilde{z}^n,\varvec{w})\) can increase without bound as \(\rho ^n \rightarrow \infty \), while the maximal neighbourhoods of the local minimiser \(\overline{z}\) verifying local optimality of \(\widetilde{z}^n\) for \(z\mapsto \inf _w\varphi ^n(z,\varvec{w})\) vanish in measure as \(n\rightarrow \infty \). The local minimisers \(\widetilde{z}^n\) that do not suffer from these issues are those that we wish to isolate, in that \(\inf _w\varphi ^n(\widetilde{z}^n,\varvec{w})\) remains bounded at the local minimum \(\widetilde{z}^n\) despite \(\rho ^n \rightarrow \infty \).

Definition 3

Let \(\{(\widetilde{\varvec{x}}^n,\widetilde{\varvec{y}}^n,\widetilde{\varvec{w}}^n,\widetilde{z}^n)\}_{n=0}^\infty \) and \(\{(\lambda ^n,\pi ^n,\rho ^n)\}_{n=0}^\infty \) satisfy Assumption 9.

  1. 1.

    The sequence \(\{(\widetilde{z}^n\,\widetilde{\varvec{w}}^n)\}_{n=0}^\infty \) is persistent if \(\limsup _{n\rightarrow \infty } \varphi ^n(\widetilde{z}^n,\widetilde{\varvec{w}}^n) < \infty \).

  2. 2.

    If \(\lim _{n\rightarrow \infty } (\widetilde{z}^n,\widetilde{\varvec{w}}^n) = (\overline{z},\overline{\varvec{w}})\), we say that \((\overline{z},\overline{\varvec{w}})\) is a persistent limit.

Remark 4

Clearly when we have a convergent sequence of local minima \(\widetilde{z}^n \rightarrow \overline{z}\) for \(z\mapsto \inf _{\varvec{w}} \varphi ^n(z, \varvec{w})\), \(n\ge 0\), then for any \(\widetilde{\varvec{w}}^n\) with \(\inf _{\varvec{w}} \varphi ^n(\widetilde{z}^n, \varvec{w})= \varphi ^n(\widetilde{z}^n, \widetilde{\varvec{w}}^n)\), \(n\ge 0\), we have any convergent subsequence \(\{(\widetilde{z}^{n_k}, \widetilde{\varvec{w}}^{n_k})\}_{k=0}^\infty \) converging to a persistent limit \((\overline{z},\overline{\varvec{w}})\).

The subdifferential analysis of \(\varphi ^{\lambda ,\rho ,\pi }\) requires addressing its nonconvexity and non-differentiability. A notion of differentiation suitable for this purpose is Fréchet subdifferentiability as defined in [13].

Definition 4

The function \(\varphi : {\mathbb {X}}\times {\mathbb {Y}}\rightarrow \mathbb {R} \cup \{\infty \}\) is Fréchet subdifferentiable at \((z^0,w^0)\) if there exists a Fréchet subderivative \((\zeta ,\omega )\) such that

$$\begin{aligned} \liminf _{(z-z^0,w-w^0) \rightarrow 0} \frac{\varphi (z,w)-\varphi (z^0,w^0)-\langle (\zeta ,\omega ), (z-z^0,w-w^0) \rangle }{\left\| (z-z^0,w-w^0) \right\| } \ge 0. \end{aligned}$$

We denote the collection of all such subderivatives by \(\widehat{\partial }\varphi (z^0,w^0)\), the Fréchet subdifferential of \(\varphi \) at \((z^0,w^0)\). A point \((z^0,w^0)\) is Fréchet stationary point of \(\varphi \) if \((0,0) \in \widehat{\partial }\varphi (z^0,w^0)\). The limiting, or Mordukhovich subdifferential, of \(\varphi \) is denoted \(\partial \varphi (\overline{z},\overline{w})\), where \((\bar{\zeta },\bar{\omega }) \in \partial \varphi (\overline{z},\overline{w})\) if there exists a sequence \(\{(\zeta ^n,\omega ^n) \in \widehat{\partial }\varphi (z^n,w^n)\}_{n=0}^\infty \) such that \((z^n,w^n)\rightarrow (\overline{z},\overline{w})\) and \((\zeta ^n,\omega ^n) \rightarrow (\bar{\zeta },\bar{\omega })\).

The first part of the following Lemma is a modest restatement of the cited result which we shall use to deduce differentiability whenever the Fréchet subdifferential is non-empty. In the second part we obtain local minimality from stationarity for structured functions.

Lemma 11

Let \( \varphi : {\mathbb {X}}\times {\mathbb {Y}}\rightarrow \mathbb {R} _{+\infty }\) be a function defined by \( \varphi (\overline{z},\overline{w}):= \min _{i \in I} \varphi _i(\overline{z},\overline{w})\) where \( \{\varphi _i \mid {\mathbb {X}}\times {\mathbb {Y}}\rightarrow \mathbb {R}_{+\infty }\}_{i \in I}\) is a finite family of proper, convex, lower semicontinuous functions.

  1. 1.

    Then

    $$\begin{aligned} \widehat{\partial }\varphi (\overline{z},\overline{w}) = \bigcap _{i \in I(\overline{z},\overline{w})} \widehat{\partial }\varphi _i(\overline{z},\overline{w}) \end{aligned}$$

    where \(I(\overline{z},\overline{w}):= \{ i \in I \mid i \in \arg \min _{i \in I} \varphi _i(\overline{z},\overline{w})\}\). If each function \(\varphi _i\), \(i \in I\), is differentiable then \(\widehat{\partial }\varphi (\overline{z},\overline{w}) \ne \emptyset \) implies \(\widehat{\partial }\varphi (\overline{z},\overline{w}) = \{\nabla \varphi (\overline{z},\overline{w})\}\).

  2. 2.

    If, in particular, Fréchet stationarity \(0 \in \widehat{\partial }\varphi _i(\overline{z},\overline{w})\) is satisfied for all \(i \in I(\overline{z},\overline{w})\), then \((\overline{z},\overline{w})\) is a local minimum of \(\varphi \) with \((0,0) \in \widehat{\partial }\varphi (\overline{z},\overline{w})\). Moreover, if at least one of the \(\varphi _i\), \(i \in I(\overline{z},\overline{w})\) is differentiable then we also have \(\widehat{\partial }\varphi (\overline{z},\overline{w}) = \nabla \varphi (\overline{z},\overline{w}) = 0\).

Proof

Part 1: Follows due to [23, Theorem 1 via Theorem 10].

Part 2: Due to the convexity of \(\varphi _i\), \(i \in I\), we have \((0,0) \in \widehat{\partial }\varphi _i(\overline{z},\overline{w})\) being a subgradient in both the Fréchet and classical sense for \(i \in I(\overline{z},\overline{w})\), and furthermore, \((\overline{z},\overline{w})\) is a globally optimal solution solution for each \(\varphi _i\), \(i \in I(\overline{z},\overline{w})\). The membership \((0,0) \in \varphi (\overline{z},\overline{w})\) would follow immediately from Part 1. We claim that there exists \(\delta >0\) such that \(\varphi (\overline{z},\overline{w}) \le \varphi (z,w)\) for all \((z,w) \in B_\delta (\overline{z},\overline{w})\). For otherwise, for some \(i' \notin I(\overline{z},\overline{w})\), we have for \((z,w)\) arbitrarily close to \((\overline{z},\overline{w})\) that \(\varphi (\overline{z},\overline{w}) > \varphi _{i'}(z,w)\). But since \(i' \notin I(\overline{z},\overline{w})\), the inequality \(\varphi (\overline{z},\overline{w}) < \varphi _{i'}(\overline{z},\overline{w})\) holds, and so the lower semicontinuity of \(\varphi _{i'}\) is contradicted. Thus, we have the local minimality \(\varphi (z,w) \ge \varphi (\overline{z},\overline{w})\) for all \((z,w) \in B_{\delta }(\overline{z},\overline{w})\). Moreover, if \((0,0) \in \widehat{\partial }\varphi (\overline{z},\overline{w})\) and for at least one \(i \in I(\overline{z},\overline{w})\) we have \(\widehat{\partial }\varphi _i(\overline{z},\overline{w}) =\{(0,0)\}\), then by Part 1, we must also have \(\widehat{\partial }\varphi (\overline{z},\overline{w}) = \{(0,0)\}\) and so \( \nabla \varphi (\overline{z},\overline{w})=(0,0)\) exists. \(\square \)

The following motivates our set of assumptions on the penalty parameter update.

Proposition 12

Assume that SMIP (1) satisfies Assumption 1. Let \(\psi \) satisfy Assumption 2. Suppose we have a persistent local minima sequence \((\widetilde{z}^{n},\widetilde{\varvec{w}}^{n} ) \rightarrow (\overline{z},\overline{\varvec{w}})\) for \(\rho ^n \rightarrow \infty \) (and hence Fréchet stationarity \(0 \in \widehat{\partial }\varphi ^n(\widetilde{z}^{n},\widetilde{\varvec{w}}^{n})\) for each n). If the PWA Assumption 10 holds, then for n sufficiently large and for all \((\widetilde{\varvec{x}}^n,\widetilde{\varvec{y}}^n) \in \Phi ^n(\widetilde{z}^n,\widetilde{\varvec{w}}^n)\), we have \(\widetilde{z}_{{\mathcal {I}}}^{n} = \widetilde{\varvec{x}}_{s,{\mathcal {I}}}^n\) for all \(s \in S\) i.e. consensus holds in the integral components at a fixed value \(\widetilde{z}_{{\mathcal {I}}}^{n}=\overline{z}_{{\mathcal {I}}}\), and furthermore, the Fréchet stationarity \(0 \in \widehat{\partial }f^n(\widetilde{\varvec{x}}^{n},\widetilde{\varvec{y}}^{n},\widetilde{\varvec{w}}^{n},\widetilde{z}^{n})\) holds.

Proof

In general, the Fréchet stationarity \(0 \in \widehat{\partial }\varphi ^n(\widetilde{z}^n,\widetilde{\varvec{w}}^n)\), \(n \ge 0\), is a much stronger notion in that it allows us to deduce the Fréchet stationarity \(0 \in \widehat{\partial }f^n(\widetilde{\varvec{x}}^n,\widetilde{\varvec{y}}^n,\widetilde{\varvec{w}}^n,\widetilde{z}^n)\) via standard subderivative inclusions for marginal mappings (see [13, Theorem 10.13], Lemma 11 and elsewhere). Hence the Fréchet stationarity \(0 \in \widehat{\partial }\varphi ^n(\widetilde{z}^{n},\widetilde{\varvec{w}}^{n} )\) implies the cross-sectional Fréchet stationarity \(0 \in \widehat{\partial }\varphi ^{n}\left( \widetilde{z}^{n},\widetilde{\varvec{w}}^{n} \mid \varvec{x}_{{\mathcal {I}}}, \varvec{y}_{{\mathcal {I}}}\right) \) for all optimal cross-sections \((\varvec{x}_{{\mathcal {I}}}, \varvec{y}_{{\mathcal {I}}}) \in \text {proj}_{{\mathcal {I}}}(\Phi ^n(\widetilde{z}^n,\widetilde{\varvec{w}}^n)) \). Identifying in terms of Lemma 11 (Part 2) \(\varphi \) with \(\varphi ^n\), the \(\varphi _i\), \(i \in I\), with the finite number of cross-sections \(\varphi ^{n}\left( \cdot ,\cdot \mid \varvec{x}_{\mathcal {I}},\varvec{y}_{\mathcal {I}}\right) \) with \((\varvec{x}_{\mathcal {I}},\varvec{y}_{\mathcal {I}}) \in {{\,\textrm{proj}\,}}_{\mathcal {I}}(K)\), and \((\overline{z},\overline{\varvec{w}})\) with \((\widetilde{z}^n,\widetilde{\varvec{w}}^n)\), we have a local minimum of \(\varphi ^n\) at \((\widetilde{z}^n,\widetilde{\varvec{w}}^n)\) and by the definition of \(\varphi ^n\) we have the Fréchet stationarity \(0 \in \widehat{\partial }f^n(\widetilde{\varvec{x}}^{n},\widetilde{\varvec{y}}^{n},\widetilde{\varvec{w}}^{n}, \widetilde{z}^{n})\). From this, we show that all such solutions have a common set of integral values for n sufficiently large. As \(\{(\widetilde{z}^{n},\widetilde{\varvec{w}}^{n} )\}_{n=0}^\infty \) is persistent there exists \(\kappa >0\) such that \( \varphi ^{n} (\widetilde{z}^{n},\widetilde{\varvec{w}}^{n} ) \le \kappa \) for all \(n\) sufficiently large. Hence for each optimal cross-section \((\widetilde{\varvec{x}}_{{\mathcal {I}}}^n, \widetilde{\varvec{y}}_{{\mathcal {I}}}^n) \in \text {proj}_{{\mathcal {I}}}(\Phi ^n(\widetilde{z}^n,\widetilde{\varvec{w}}^n))\) we have, using (5),

$$\begin{aligned} \frac{1}{\rho ^n}&[ \kappa + (\Vert c\Vert + \Vert d\Vert + \limsup _{n' \rightarrow \infty } \Vert \lambda ^{n'}\Vert ) (\sup _{(x,y) \in K}\max \{ \Vert x\Vert , \Vert y\Vert \}) ] \nonumber \\&\quad \ge \sum _{s \in S} \pi _s^n \psi (\widetilde{z}^{n} - \widetilde{x}_s^{n}, 0 ) \ge \frac{m}{2} \sum _{s \in S} \pi _s^n \Vert \widetilde{z}_{{\mathcal {I}}}^{n} - \widetilde{x}_{s, {\mathcal {I}}}^{n}\Vert ^2 \end{aligned}$$
(18)

The left-hand side of (18) tends to zero as \(\rho ^n \rightarrow \infty \) and \(\pi _s^n \ge \xi \) for all \(s \in S_{n}\). After choosing a small \(0<\delta <\frac{1}{2\vert S \vert }\) we conclude that \(\Vert \widetilde{x}_{s, {\mathcal {I}}}^{n} - z_{{\mathcal {I}}}^{n} \Vert < \frac{1}{2}\) for all \(s \in S_{n}\) and so \(\widetilde{x}_{s', {\mathcal {I}}}^{n} = \widetilde{x}_{s, {\mathcal {I}}}^{n}= z_{{\mathcal {I}}}^{n}\) for all \(s,s' \in S_{n}\). As \(z_{{\mathcal {I}}}^{n} = \widetilde{x}_{s, {\mathcal {I}}}^{n}\) for all \(s \notin S_{n}\) already, we have equality for all \(s \in S\) and as \(\widetilde{x}_{s, {\mathcal {I}}}^{n} = \overline{x}_{s, {\mathcal {I}}} = \overline{z}_{\mathcal {I}} \) are fixed independent of \(\rho ^n\) for \(n\) sufficiently large. \(\square \)

Feasibility may also be shown to hold, as stated in Lemma 13. Furthermore, in Proposition 14 we state the relationships between persistency and feasibility.

Lemma 13

Let the problem SMIP (1) satisfy the SMIP Assumption 1, and let penalty function \(\psi \) satisfy Assumption 2. If a sequence \(\{(\widetilde{\varvec{x}}^n,\widetilde{\varvec{y}}^n,\widetilde{\varvec{w}}^n,\widetilde{z}^n)\}_{n=0}^\infty \) given \(\{(\lambda ^n,\pi ^n,\rho ^n)\}_{n=0}^\infty \) with integer index \(n \ge 0\) satisfies the Assumption 9, then \(\widetilde{y}^n_s = \widetilde{w}^n_{s}\in Y_s (\widetilde{x}^n_s)\), \(s \in S\), and \(\widetilde{z}^n \in \arg \min _{z} \sum _{s \in S} \pi _s^n \psi (z-\widetilde{x}_s^n, 0)\). Furthermore, for each \(n \ge 0\) for which \(\widetilde{z}^n \in X\) holds, we have \(\varphi ^n(\widetilde{z}^n,\widetilde{\varvec{w}}^n) = \inf _{\varvec{w}} \varphi ^n(\widetilde{z}^n,\varvec{w}) \) bounded from above independent of the specific value of \(\rho ^n\).

Proof

It follows from Lemma 4 that for all \(n \ge 0\), we have the existence of \((\widetilde{\varvec{x}}^n, \widetilde{\varvec{y}}^n)\in K\) that attains the infimum in the definition of \(\varphi ^n ( \widetilde{z}^n,\widetilde{\varvec{w}}^n)\). Because \(\widetilde{\varvec{w}}_s^n\), \(s \in S\), is a global optimum for \(w_s \mapsto \varphi _s^n(\widetilde{z}^n,w)\), the claim \(\widetilde{y}_s^n=\widetilde{w}_s^n\), \(s \in S\) follows readily from Proposition 3.

To establish that \(\widetilde{z}^n \in \arg \min _{z} \sum _{s \in S} \pi _s^n \psi (z-\widetilde{x}_s^n, 0)\), assume that \(\widetilde{z}^n \notin \arg \min _{z} \sum _{s \in S} \pi _s^n \psi (z-\widetilde{x}_s^n, 0)\). For any \(\overline{z}^n\in \arg \min _{z} \sum _{s \in S} \pi _s^n \psi (z-\widetilde{x}_s^n, 0)\), using the convexity of \(\psi \), and \(\eta \in (0,1)\) we have

$$\begin{aligned}&\sum _{s \in S} \inf _{w} \varphi _s^n(\eta \overline{z}^n + (1-\eta )\widetilde{z}^n,w_s) \\&\quad \le \sum _{s\in S} p_{s}\left[ c^{\top } \widetilde{x}_s^n +d_{s}^{\top } \widetilde{w}_s^n \right] +(\lambda _s^n)^{\top } \widetilde{x}_{s}^n +\rho ^n \pi _s^n \psi \left( [\eta \overline{z}^n + (1-\eta ) \widetilde{z}^n] - \widetilde{x}_{s}^n, 0 \right) \\&\quad \le \eta \sum _{s\in S} p_{s}\left[ c^{\top } \widetilde{x}_s^n +d_{s}^{\top } \widetilde{w}_s^n \right] +(\lambda _s^n)^{\top } \widetilde{x}_{s}^n +\rho ^n \pi _s^n \psi \left( \overline{z}^n - \widetilde{x}_{s}^n, 0 \right) \\&\qquad + (1-\eta ) \sum _{s\in S} p_{s}\left[ c^{\top } \widetilde{x}_s^n +d_{s}^{\top } \widetilde{w}_s^n \right] + (\lambda _s^n)^{\top } \widetilde{\varvec{x}}_{s}^n +\rho ^n \pi _s^n \psi \left( \widetilde{z}^n - \widetilde{x}_{s}^n, 0 \right) \\&\quad < \varphi ^n(\widetilde{z}^n,\widetilde{\varvec{w}}^n), \end{aligned}$$

which would contradict the local optimality of \(\widetilde{z}^n\) for \(z\mapsto \inf _{\varvec{w}} \varphi ^n(z,\varvec{w})\). Thus, \(\widetilde{z}^n \in \arg \min _{z} \sum _{s \in S} \pi _s^n \psi (z-\widetilde{x}_s^n, 0)\) for all \(n \ge 0\).

Furthermore, it also follows that, when \(\widetilde{z}^n \in X\) (a compact set):

$$\begin{aligned} \varphi ^n(\widetilde{z}^n, \widetilde{\varvec{w}}^n )&= \sum _{s\in S}p_{s}^n\left[ c^{\top } \widetilde{x}_{s}^n +d_{s}^{\top } \widetilde{y}_s^n \right] +(\lambda _s^n)^\top \widetilde{x}_s^n+\rho ^n \pi _s^n \psi \left( \widetilde{z}^n - \widetilde{x}_{s}^n , \widetilde{w}_s^n-\widetilde{y}_s^n \right) \\&\le \sup _n\inf _{\widetilde{\varvec{w}} \in Y(\widetilde{z}^n)} \sum _{s\in S}p_{s} [ c^{\top } \widetilde{z}^n +d_{s}^{\top } \widetilde{w}_s ] +(\lambda _s^n)^\top \widetilde{z}^n\le \Gamma < \infty . \end{aligned}$$

where, after noting that \(\sum _{s \in S} (\lambda _s^n)^\top \widetilde{z}^n= 0\) vanishes due to \(\lambda \in \Lambda \), we have that \(\Gamma < \infty \) can be chosen to hold regardless of the specific realisations of \(\rho ^n > 0\) and \(\widetilde{z}^n \in X\) due to the boundedness properties of the SMIP Assumptions; the finiteness of \(\Gamma \) also requires the assumed relatively complete recourse. \(\square \)

Proposition 14

Assume \(\psi \) satisfies Assumption 2. If \((\overline{z},\overline{\varvec{w}})\) is a persistent limit for a persistent sequence \(\{(\widetilde{z}^n,\widetilde{\varvec{w}}^n)\}_{n=0}^\infty \) then

  1. 1.

    \((\overline{z}, \overline{\varvec{w}}) \in F\); namely \(\overline{z}\in X\) and \(\overline{w}_s \in Y_s (\overline{z})\);

  2. 2.

    there is a fixed neighbourhood \(B_{\delta }\left( \overline{z},\overline{\varvec{w}}\right) \) with \(\delta > 0\) on which \((\widetilde{z}^n,\widetilde{\varvec{w}}^n)\) is locally optimal for \(\varphi ^n\) for all n large enough (i.e., for all \(\rho ^n\) larger than some threshold \(\tilde{\rho }\)).

If we furthermore assume that the PWA Assumption 10 holds, then we have \(\widetilde{z}_{\mathcal {I}}^n=\widetilde{x}_{s,{\mathcal {I}}}^n\) for all \(s \in S\) for all \((\widetilde{\varvec{x}}^n,\widetilde{\varvec{y}}^n) \in \Phi ^{n} (\widetilde{z}^n,\widetilde{\varvec{w}}^n)\) for \(n\) large enough.

Proof

To prove (1), suppose \((\overline{z}, \overline{\varvec{w}}) \notin F\). Then there exists \(\delta >0\) such that \(\inf _{(x_s,y_s) \in K_s} \Vert (\overline{z}-x_s, \overline{w}_s-y_s) \Vert ^2 \ge 2\delta \) for at least one scenario \(s \in S\). As \((\widetilde{z}^n, \widetilde{\varvec{w}}^n) \rightarrow (\overline{z}, \overline{\varvec{w}})\) we have, for n (and thus \(\rho ^n\)) large enough that, \(\inf _{(x,y) \in K_s} \Vert (\widetilde{z}^n-x_s, \widetilde{w}_s^n-y_s) \Vert ^2 \ge \delta \) for some \(s \in S_n\) (in which case \(\widetilde{z}^n \ne \widetilde{x}^n_s\) since by Proposition 3 we have \(\widetilde{\varvec{w}}^n = \widetilde{\varvec{y}}^n\)). We now use Assumption 2 (3) to bound the penalty values below. By the differentiability assumed in Assumption 2 we apply the inequality (5) for each \(s \in S\) to get \( \psi \left( \widetilde{z}^n-x_s, \widetilde{w}_s^n - y_s \right) \ge \frac{m}{2} \Vert (\widetilde{z}^n-x_s, \widetilde{w}_s^n-y_s) \Vert ^2. \) It follows that, as \(\lim _{n \rightarrow \infty } \rho ^n = \infty \), we have \(\liminf _n \pi ^n_s \ge \xi \) so

$$\begin{aligned} \min _{\begin{array}{c} (x_s, y_s) \in K_s \end{array}}{} & {} \Big \{ \rho ^n\sum _{s\in S}\pi _{s}^n \psi \left( \widetilde{z}^n-x_s, \widetilde{w}_s^n-y_s \right) \Big \} \\{} & {} \ge \frac{m\rho ^n\xi }{2} \inf _{(x_s, y_s) \in K_s} \left\{ \Vert (\widetilde{z}^n-x_s, \widetilde{w}_s^n-y_s) \Vert ^2 \right\} \ge \frac{m \rho ^n \xi }{2} \delta \end{aligned}$$

which is unbounded. This contradicts the assumption that \(\left\{ \varphi ^n\left( \widetilde{z}^n,\widetilde{\varvec{w}}^n \right) \right\} _{n=0}^\infty \) is bounded above as required by the persistency assumption on \(\{\widetilde{z}^n\}_{n=0}^\infty \). Thus \( (\overline{z}, \overline{\varvec{w}})\in F\).

Having shown \((\overline{z}, \overline{\varvec{w}})\in F\), it follows from definitions that \(\overline{w}_s \in Y_s(\overline{z})\) and so claim (2) follows readily from Corollary 8 using the same critical \(\tilde{\rho }\) and \(\tilde{\delta }\) that apply regardless of the choice of \(\overline{z}\in X\), and so it is established that \((z,\varvec{w}) \mapsto \varphi ^n(z,\varvec{w})\) is convex over \(B_\delta (\overline{z},\overline{\varvec{w}})\) for all \(\rho ^n> \tilde{\rho }\) with \(n\) sufficiently large. By Remark 4 we have the same neighbourhood associated with the local minimum at \((\overline{z},\overline{\varvec{w}})\) also associated with a persistent local minimum at some \((\overline{z}, \overline{\varvec{w}})\) and thus \(B_\delta (\overline{z},\overline{\varvec{w}})\) serves as the fixed neighbourhood verifying local optimality of \((\widetilde{z}^n,\widetilde{\varvec{w}}^n)\) for \((z,\varvec{w}) \mapsto \varphi ^n(z,\varvec{w})\) for each n large enough. The last claim follows from Proposition 12. \(\square \)

The following contains a version of the strong augmented duality result for augmented Lagrangian. Notice that this result is more general than those in [15], in that it allows for the consideration of an inexact penalty that may be differentiable everywhere. When we have pure integer variables we see that all feasible solutions are persistent and we have a stronger form of duality.

Theorem 15

Suppose problem SMIP (1) satisfies the SMIP Assumption 1 and \(\psi \) is an ICRF.

  1. 1.

    If problem SMIP (1) has pure integer variable in both stages and feasible point \((\overline{z}, \overline{\varvec{w}}) \in F\) satisfies \(\limsup _n \varphi ^n(\overline{z}, \overline{\varvec{w}})<+\infty \) for \(\lim _{n\rightarrow \infty } \rho ^n= \infty \), then \((\overline{z}, \overline{\varvec{w}}) \in F\) is a local minimum of \(\varphi ^n\) for \(n\) large enough.

  2. 2.

    For any sequence \(\{(\widetilde{z}^n, \widetilde{\varvec{w}}^n)\}_{n=0}^\infty \) of global minimisers of \(\varphi ^n\) with \(\lim _{n\rightarrow \infty } \rho ^n = \infty \) and \(\lambda ^n\), \(n\ge 0\), satisfying Assumption 9(2), and \(\pi _s > 0\), \(s \in S\), then its limit points \((\overline{z},\overline{\varvec{w}})\) are globally optimal solutions to SMIP (1). That is, there exists at least one globally optimal solution \((\overline{z},\overline{\varvec{w}})\) to SMIP (1) that is a persistent limit. Moreover, for any \(\{ \rho ^n\}_{n=0}^\infty \) with \(\rho ^n \rightarrow \infty \) there exists a persistent local minimum sequence \(\{(\widetilde{z}^n,\widetilde{\varvec{w}}^n)\}_{n=0}^\infty \) for which \(\lim _{n\rightarrow \infty } \varphi ^n(\widetilde{z}^n,\widetilde{\varvec{w}}^n)=\zeta ^{SMIP}\)

  3. 3.

    We have for any \(\lambda \in \Lambda \), \(\pi _s > 0\), \(s \in S\),

    $$\begin{aligned}{} & {} \sup _{\rho >0} \min _{\left( {z},{\varvec{w}}\right) \in \mathbb {X}\times \mathbb {Y}^{\small {\vert S \vert }}} \varphi ^{\lambda ,\rho ,\pi }\left( {z},{\varvec{w}} \right) = \zeta ^{SMIP}. \end{aligned}$$
    (19)

    Moreover, for a pure integer SMIP a finite value of \(\bar{\rho }>0\) exists for which \(\min _{\left( {z},{\varvec{w}}\right) \in \mathbb {X}\times \mathbb {Y}^{\small {\vert S \vert }}} \varphi ^{\lambda ,\rho ,\pi }\left( {z},{\varvec{w}} \right) = \zeta ^{SMIP} \) for \(\rho \ge \bar{\rho }\).

Proof

1): Suppose \((\overline{z}, \overline{\varvec{w}}) \in F\). Using Lemma 7 and Corollary 8 we have a locally convex function

$$\begin{aligned} (z,\varvec{w}) \mapsto \varphi ^n(z, \varvec{w}) = \varphi ^n(z, \varvec{w}\mid \overline{z}_{\mathcal {I}}, \overline{\varvec{w}}_{\mathcal {I}}) \end{aligned}$$

for all \((z, \varvec{w}) \in B_{\delta } (\overline{z},\overline{\varvec{w}})\) for some fixed \(\delta> \tilde{\delta } > 0\) and \(\rho ^n> \tilde{\rho } >0\) with \(n\) large enough. Moreover for all \((z^\prime , \varvec{w}^\prime ) \in F\) with \((\overline{z}_{\mathcal {I}}, \overline{\varvec{w}}_{\mathcal {I}}) \ne (z_{\mathcal {I}}^\prime , \varvec{w}_{\mathcal {I}}^\prime ) \) we have \(\varphi ^n(z, \varvec{w}\mid z_{\mathcal {I}}^\prime , \varvec{w}_{\mathcal {I}}^\prime ) > \varphi ^n(z, \varvec{w}\mid \overline{z}_{\mathcal {I}}, \overline{\varvec{w}}_{\mathcal {I}})\) for \((z,\varvec{w}) \in B_{\delta } (\overline{z},\overline{\varvec{w}})\) for some fixed \(\delta> \tilde{\delta } > 0\) and \(\rho ^n> \tilde{\rho } >0\) with \(n\) large enough. Supposing \((\overline{z}, \overline{\varvec{w}}) \in F\) is pure integer, then we have \((\overline{z}_{{\mathcal {I}}}, \overline{\varvec{w}}_{{\mathcal {I}}}) = (\overline{z}, \overline{\varvec{w}})\) and hence \((\overline{z}, \overline{\varvec{w}})\) is a local minimum of \(\varphi ^n\) with \(\varphi ^n(\overline{z}, \overline{\varvec{w}}) \le \sum _{s \in S} c^{\top } \overline{z}+ d_s^{\top } \overline{w}_s < +\infty \), due to the boundedness assumptions for the SMIP (and dual feasibility of any sequence \(\{\lambda ^n\}_{n=0}^\infty \)).

2): Let \(\{(\widetilde{z}^n, \widetilde{\varvec{w}}^n)\}_{n=0}^\infty \) be a sequence where each \((\widetilde{z}^n,\widetilde{\varvec{w}}^n)\), \(n\ge 0\), is a global minimiser of \(\varphi ^n\) which implies \(\varphi ^n(\widetilde{z}^n,\widetilde{\varvec{w}}^n) \le \zeta ^{SMIP}\), so that \(\{(\widetilde{z}^n,\widetilde{\varvec{w}}^n)\}_{n=0}^\infty \) is a persistent sequence. Its limit points \((\overline{z},\overline{\varvec{w}})\) thus satisfy \((\overline{z},\overline{\varvec{w}}) \in F\) by Proposition 14. Furthermore \((\widetilde{x}_s^n,\widetilde{y}^n_s) \in \Phi ^n_s(\widetilde{z}^n,\widetilde{w}_s^n)\) and as \(\Phi ^{\infty } (\overline{z}, \overline{\varvec{w}}) = \{ (\overline{z}, \overline{\varvec{w}})\}\) we have (after passing to the subsequence) \(\lim _{n\rightarrow \infty } (\widetilde{x}^n_s,\widetilde{y}^n_s) = (\overline{z},\overline{w}_s) \in K_s\). (For if not, the boundedness of \(\{\lambda ^n\}_{n=0}^\infty \), \(\rho ^n\rightarrow \infty \), \(\pi _s > 0\), \(s \in S\), and the minorisation (5) would imply that \(\limsup _{n\rightarrow \infty } \varphi ^n(\widetilde{z}^n,\widetilde{\varvec{w}}^n) =\infty \).) Furthermore, since \(\varphi ^n(\widetilde{z}^n,\widetilde{\varvec{w}}^n) \le \zeta ^{SMIP}\), we must have also \(\sum _{s \in S} f_s(\widetilde{x}_s^n,\widetilde{y}_s^n) + (\lambda _s^n)^{\top } \widetilde{x}_s^n\le \zeta ^{SMIP}\) and so \(\sum _{s \in S} f_s(\overline{z},\overline{\varvec{w}}_s) = \zeta ^{SMIP}\) (due to the boundedness and dual feasibility \(\lambda ^n \in \Lambda \) and \(\widetilde{x}_s^n\rightarrow \overline{z}\) for all \(s \in S\), we have \(\limsup _{n\rightarrow \infty } \sum _{s \in S} \lambda _s^n\widetilde{x}_s^n=0\)) and thus, \((\overline{z},\overline{w})\) must be optimal for the original SMIP (1).

3). Denote

$$\begin{aligned} \xi ^{SMIP}_{\rho }:= \min \{ \varphi ^{\lambda ,\rho ',\pi }\left( {z}^{\rho '},{\varvec{w}}^{\rho '} \right) \mid \{(z^{\rho '},\varvec{w}^{\rho '})\} \text { are persistent for } \rho ' \ge \rho \}. \end{aligned}$$

If \(\left( {\overline{z}},{\overline{\varvec{w}}} \right) \) is a persistent limit, Lemma 14 implies \(\left( {\overline{z}},{\overline{\varvec{w}}} \right) \in F\) and by Proposition 11\( \varphi ^{\lambda ,\rho ,\pi }\left( {\overline{z}},{\overline{\varvec{w}}} \right) \le \sum _{s\in S}p_{s}\left[ c^{\top }\overline{z}+d_s^{\top }\overline{w}_{s}\right] . \) It follows that:

$$\begin{aligned} \lim _{\rho \rightarrow \infty } \xi ^{SMIP}_{\rho } \le \min _{(\overline{z},\overline{\varvec{w}})}\left\{ \sum _{s\in S}p_{s}\left[ c^{\top }\overline{z}+d_{s}^{\top }\overline{w}_{s}\right] \mid (\overline{z},\overline{\varvec{w}}) \text { are persistent limits} \right\} \le \zeta ^{SMIP} , \end{aligned}$$

where the last inequality follows from the existence of global solutions that are limits of persistent local minima. Let \(\rho ^n \rightarrow \infty \) and \(\{(\widetilde{z}^n, \widetilde{\varvec{w}}^n) \}_{n=0}^\infty \) be a sequence of persistent local minima, globally minimising \(\varphi ^n\) with \(\varphi ^n(\widetilde{z}^n, \widetilde{\varvec{w}}^n) \rightarrow \zeta ^{SMIP} \). Via global optimality \(\varphi ^n(\widetilde{z}^n, \widetilde{\varvec{w}}^n) \le \xi ^{SMIP}_{\rho ^n} \), from which it follows that \(\sup _{\rho >0} \xi ^{SMIP}_{\rho } = \zeta ^{SMIP}\). As all global minima are eventually persistent we are finished.

When we have a pure integer SMIP then by Part 1, we have the existence of a \(\bar{\rho } >0\) such that \(\varphi ^{\lambda ,\rho ,\pi }({z}^{\rho },{\varvec{w}}^{\rho }) = \varphi ^{\lambda ,\rho ,\pi }({\overline{z}},{\overline{\varvec{w}}})\) for all \(\rho \ge \bar{\rho }\), where \(({\overline{z}},{\overline{\varvec{w}}})\) is a global minimum of the SMIP. Hence, a global minimum is achieved for a finite \(\rho \). \(\square \)

We now investigate the role of the fixed neighbourhoods verifying the local minima for \(\varphi ^n\), \(n\ge 0\). Indeed, for limiting points that are not persistent, we show that such a neighbourhood does not exist.

Assumption 16

The sequence \(\{(\widetilde{\varvec{x}}^n,\widetilde{\varvec{y}}^n,\widetilde{\varvec{w}}^n,\widetilde{z}^n), (\lambda ^n, \pi ^n, \rho ^n)\}_{n=0}^\infty \) satisfies the joint PH assumptions (joint PHA) when:

  1. 1.

    The problem SMIP (1) satisfies the SMIP Assumption 1,

  2. 2.

    The penalty function \(\psi \) meets the Integer Compatibility Regularisation Functions Assumptions, (ICRF) given in Assumption 2, and

  3. 3.

    the sequences \(\{(\widetilde{\varvec{x}}^n,\widetilde{\varvec{y}}^n,\widetilde{\varvec{w}}^n,\widetilde{z}^n)\}_{n=0}^\infty \) and \(\{(\lambda ^n, \pi ^n, \rho ^n)\}_{n=0}^\infty \) with integer index \(n \ge 0\) satisfy the Solution Sequence Assumptions (SSA) in Assumption 9 and Penalty Weighting (PWA) given in Assumption 10.

Proposition 17

Assume \(\{(\widetilde{\varvec{x}}^n,\widetilde{\varvec{y}}^n,\widetilde{z}^n,\widetilde{\varvec{w}}^n), (\lambda ^n,\pi ^n, \rho ^n)\}_{n=0}^\infty \) satisfies the joint PHA Assumption 16. If the radii \(\delta _n\), \(n \ge 0\), on which \((\widetilde{z}^n,\widetilde{\varvec{w}}^n)\) are locally optimal for \(\varphi ^n\) satisfies \(\liminf _{n \rightarrow \infty } \delta _n = \bar{\delta }\) for some \(\bar{\delta } > 0\), then

  1. 1.

    \(\lim _{n\rightarrow \infty } \Vert \widetilde{z}^n-\widetilde{x}_s^n\Vert = 0\) for all \(s \in S\) for which \(\limsup _{n\rightarrow \infty } \pi _s^n > 0\). Thus, \(\overline{z}\in X\), \(\lim _{n\rightarrow \infty } \sum _{s \in S} \pi _s^n \widetilde{x}_s^n \rightarrow \overline{z}\) and for n sufficiently large, we have \(\widetilde{z}_{\mathcal {I}}^n = \widetilde{x}_{s,{\mathcal {I}}}^n\) for all \(s \in S\), and \(\sum _{s \in S} \pi _s \widetilde{x}_s^n \in X\). (When \(\psi = \frac{1}{2} \Vert \cdot \Vert ^2\) we have \(\widetilde{z}^n = \sum _{s \in S} \pi _s \widetilde{x}_s^n \in X\) for all \(n\ge 0\).)

  2. 2.

    For all n we have \(\widetilde{w}_s^{n} = \widetilde{y}^{n}_s \in Y_s (\widetilde{x}^{n}_s)\) and limit points \(\overline{w}_s\) of \(\{\widetilde{w}_s^n\}_{n=0}^\infty \) satisfy \(\overline{w}_s \in Y_s(\overline{z})\) for all \(s \in S\), with \(\limsup _{n \rightarrow \infty } \pi _s >0\).

Proof

See Appendix A. \(\square \)

The previous analysis allows us to pose the following result which confirms that the “basis of attraction” of non-persistent local minima has no interior in the limit. The next result follows immediately as contra-positives of Proposition 17.

Corollary 18

Let \(\{(\varvec{x}^n,\varvec{y}^n,\varvec{w}^n,z^n),(\lambda ^n,\pi ^n, \rho ^n)\}_{n=0}^\infty \) satisfy the joint PH Assumption  16. If any one of the following is true:

  1. 1.

    \(\overline{z}\notin X\),

  2. 2.

    There exist arbitrarily large n such that \(z_{\mathcal {I}}^n \ne x_{s,{\mathcal {I}}}^n\) for at least one \(s\in S\),

  3. 3.

    \(\lim _{n \rightarrow \infty } x_s^n \ne \overline{z}\) or \(\lim _{n \rightarrow \infty } x_s^n\) does not exist for at least one \(s \in S\) for which \(\limsup _{n\rightarrow \infty } \pi ^n >0\) and

  4. 4.

    when \(\psi = \frac{1}{2} \Vert \cdot \Vert ^2\), \(z^n \not \rightarrow \overline{z}\),

then \(\lim _{n \rightarrow \infty } \delta ^n = 0\) for the radii \(\delta ^n > 0\), \(n\ge 0\), on which the local optimality of each \(\widetilde{z}^n\) for \(z\mapsto \inf _w\varphi ^n(z,w)\) is verified.

Example 1

Consider an augmented Lagrangian reformulation of a simple split variable extensive form of a two-stage SMIP

$$\begin{aligned} \min _{x,y,z,w} \left\{ \begin{array}{c} \sum _{s=1}^2 \left[ c^\top x_s + d_s^\top y_s \right] + \rho \pi _s\psi (z-x_s,w_s-y_s) \mid \, (x_1,y_1) \in K_1,\, (x_2,y_2) \in K_2, \\ x_1=z,\, x_2=z,\, y_1=w_1,\, y_2=w_2, \, x_1,x_2 \in \{0,1\},\, y_1,y_2 \in [0,1] \end{array} \right\} \end{aligned}$$
(20)

with penalty coefficient \(\rho > 0\) where \(K_1 = \{(x,y) \in \{0,1\}\times [0,1] \mid x\le y \} \supset \{(0,0),(1,1)\}\), \(K_2=\{(x,y) \in \{0,1\}\times [0,1] \mid 1-x\le y\} \supset \{(0,1),(1,0)\}\), \(c=0\), \(d_1=1=d_2\), and \(\psi (u,v)= \Vert (u,v)\Vert ^2\) for \(s=1,2\). We assume \(p_1=p_2=\pi _1=\pi _2=\frac{1}{2}\), and \(\lambda =0\) throughout this example. For any \(\{\rho ^n\}_{n=0}^\infty \) with \(\rho ^n> 0\), \(n\ge 0\), one may verify that the local minimisers \(\widetilde{z}^n,\widetilde{\varvec{w}}^n\) and local optimal values as parameterised by \(\rho ^n\), \(n\ge 0\), are as follows

The locally optimal solutions \((\widetilde{z}^n,\widetilde{\varvec{w}}^n)\) are the same for all \(\rho > 0\), so that \((\widetilde{z}^n,\widetilde{\varvec{w}}^n)=(\overline{z},\overline{\varvec{w}})\) for \(n\ge 0\) for each locally optimal \((\overline{z},\overline{\varvec{w}})\). Here we see that the two globally optimal solutions for \(\varphi ^{\lambda ,\rho ,\pi }\) are the persistent solutions with either \(\overline{z}=\overline{x}_1=\overline{x}_2=0\) or \(\overline{z}=\overline{x}_1=\overline{x}_2=1\), which both satisfy non-anticipativity. The non-persistent solution has \(\overline{z}=0.5\) with \(0=\overline{x}_1 \ne \overline{x}_2=1\); it only stays optimal over an ever shrinking neighbourhood \(B_\delta (\overline{z},\overline{\varvec{w}})\) with radius \(\delta = 1 / \rho ^n\) vanishing as \(\rho ^n\rightarrow \infty \).

Solution

Value

Locally Optimal Over \(B_\delta (\overline{z},\overline{\varvec{w}})\)

 

\((\widetilde{\varvec{x}}^n,\widetilde{\varvec{y}}^n,\widetilde{z}^n,\widetilde{\varvec{w}}^n)\)

\(\varphi ^n(\widetilde{z}^n,\widetilde{\varvec{w}}^n)\)

\((\overline{z},\overline{\varvec{w}})\)

\(\delta \)

Persistent?

\(\left( \left[ \begin{array}{c} 0 \\ 0 \end{array}\right] ,\left[ \begin{array}{c} 0 \\ 1 \end{array}\right] ,0,\left[ \begin{array}{c} 0 \\ 1 \end{array}\right] \right) \)

1

\(\left( 0,\left[ \begin{array}{c} 0 \\ 1 \end{array}\right] \right) \)

\(\frac{1}{2}-\frac{1}{\rho ^n}\)

Yes

\(\left( \left[ \begin{array}{c} 0 \\ 1 \end{array}\right] ,\left[ \begin{array}{c} 0 \\ 0 \end{array}\right] ,\frac{1}{2},\left[ \begin{array}{c} 0 \\ 0 \end{array}\right] \right) \)

\(\frac{\rho ^n}{4}\)

\(\left( \frac{1}{2},\left[ \begin{array}{c} 0 \\ 0 \end{array}\right] \right) \)

\(\frac{1}{\rho ^n}\)

No

\(\left( \left[ \begin{array}{c} 1 \\ 1 \end{array}\right] ,\left[ \begin{array}{c} 1 \\ 0 \end{array}\right] ,1,\left[ \begin{array}{c} 1 \\ 0 \end{array}\right] \right) \)

1

\(\left( 1,\left[ \begin{array}{c} 1 \\ 0 \end{array}\right] \right) \)

\(\frac{1}{2}-\frac{1}{\rho ^n}\)

Yes

5 Analysis of the block Gauss–Seidel sequence

Block Gauss–Seidel iterations are most easily analysed for differentiable optimisation problems. However, we need to perform Gauss–Seidel iterations on nonsmooth functions with varying parameterisations and, hence, we develop the necessary theory to facilitate this analysis. We start with statements of elementary definitions and properties of Gauss–Seidel iterations that apply under general assumptions on the function and their domain sets.

Definition 5

Let \(G: {\mathbb {X}}\times {\mathbb {Y}}\rightarrow \mathbb {R}_{+\infty }\) be a continuous function over a closed subset of \({\mathbb {X}}\times {\mathbb {Y}}\). A solution \(({z}^*,w^*) \in {\mathbb {X}}\times {\mathbb {Y}}\) is a partial minimum of G if

$$\begin{aligned} G\left( {z}^{*},w\right)&\ge G\left( {z}^{*},w^{*}\right) \quad \text {for all } w\in {\mathbb {Y}}\quad \text {and} \end{aligned}$$
(21a)
$$\begin{aligned} G\left( {z},w^{*}\right)&\ge G\left( {z}^{*},w^{*}\right) \quad \text {for all } {z}\in {\mathbb {X}}. \end{aligned}$$
(21b)

For general non-smooth G, partial minimality does not imply (joint) minimality. Under suitable assumptions of convexity and (additive) separability of non-smoothness in G, we may recover joint minimality as described in Lemma 21

Assumption 19

Separability and Convexity Assumptions (SCA) on \(G: {\mathbb {X}}\times {\mathbb {Y}}\rightarrow \mathbb {R}_{+\infty }\):

  1. 1.

    G is bounded from below and its level sets are bounded.

  2. 2.

    G has the form \(G({z},w) = Q({z},w) + h(w)\) where

    1. (a)

      \(Q: {\mathbb {X}}\times {\mathbb {Y}}\rightarrow \mathbb {R}\) is convex and continuously differentiable over \({\mathbb {X}}\times {\mathbb {Y}}\);

    2. (b)

      \(h: {\mathbb {Y}}\rightarrow \mathbb {R}_{+\infty }\) is proper, lower semicontinuous, and convex.

The following properties follow immediately from Assumption 19.

Lemma 20

Let \(G: {\mathbb {X}}\times {\mathbb {Y}}\rightarrow \mathbb {R}_{+\infty }\) satisfy SCA given in Assumption 19.

  1. 1.

    G, Q, and h are regular functions (due to the assumed convexity). Thus, \(\widehat{\partial }G = \partial G\) exist; and likewise with \(\widehat{\partial }Q=\partial Q = \nabla Q\), \(\widehat{\partial }h = \partial h\).

  2. 2.

    Calculus rules (e.g., [13, Exercise 8.8(c)]) imply that for any \(({z},w)\)

    $$\begin{aligned} \partial G({z},w)&= \{ \nabla _{z}Q({z},w)\} \times \{ \nabla _wQ({z},w) + \partial _wh(w) \} \ne \emptyset , \end{aligned}$$
    (22a)
    $$\begin{aligned} \widehat{\partial }G({z},w)&= \{ \nabla _{z}Q({z},w)\} \times \{ \nabla _wQ({z},w) + \widehat{\partial }_wh(w) \} \ne \emptyset . \end{aligned}$$
    (22b)

Lemma 21

Assume that G satisfies Assumption 19. If \(\left( {z}^{*},w^{*}\right) \in {\mathbb {X}}\times {\mathbb {Y}}\) is a partial minimum of G as in Definition 5 (so that \(\nabla _{{z}} G({z}^{*}, w^{*}) =0\) and \(0 \in \widehat{\partial }_{w} G\left( {z}^{*},w^{*}\right) \)), then we have the Fréchet stationarity \((0,0) \in \widehat{\partial }G\left( {z}^{*},w^{*}\right) \).

Proof

Follows as an application of [22, Theorem 4.1]. \(\square \)

5.1 On the stationarity of Gauss–Seidel limit points

Two views of framing the Gauss–Seidel step of Sect. 2.1 now apparent are: 1) via continuous block \(z\) and block \(w\) partial minimisation updates of the continuous “regularisation function” \(\varphi ^{\lambda ,\rho ,\pi }\), and 2) continuous consensus block \(z\) minimisation updates and mixed-integer block \((x,y,w)\) minimisation updates applied directly to augmented Lagrangian reformulations of SMIP (1). The former approach still requires an analysis of the \((x,y)\) update, but in a hidden form. On the other hand, the latter relies on the fact that the iterates will eventually fall into a region where the integer variable will become fixed in value, and thus subproblem optimisations are local and associated with continuous (and convex) parts of the problem.

Motivated by the use of Lagrangian- and penalty-based solution approaches, we furthermore assume that \(G=G^{k}\) varies across iterations \(k\ge 0\) subject to the following assumptions.

Assumption 22

Structural Assumptions (SA): Given \(\{G^{k}: {\mathbb {X}}\times {\mathbb {Y}}\rightarrow \mathbb {R}_{+\infty }\}_{k=0}^\infty \), let Assumption 19 hold for each \(G^k\), \(k\ge 0\). We assume that \(\{G^k\}_{k=0}^\infty \) epi-converges to \(\overline{G}: {\mathbb {X}}\times {\mathbb {Y}}\rightarrow \mathbb {R}_{+\infty }\), \(\{Q^k\}_{k=0}^\infty \) epi-converges to \(\overline{Q}: {\mathbb {X}}\times {\mathbb {Y}}\rightarrow \mathbb {R}\) and that \(\{h^k\}_{k=0}^\infty \) epi-converges to \(\overline{h}: {\mathbb {X}}\times {\mathbb {Y}}\rightarrow \mathbb {R}_{+\infty }\).

In Sect. 5.2, we identify the sequences \(\{G^k, ({z}^k,w^k)\}_{k=0}^\infty \) with a subsequence of GS (mid-)iterations associated with the application of Algorithm 1. For now, we deliberately detach the analysis of \(\{G^k, ({z}^k,w^k)\}_{k=0}^\infty \) from its intended algorithmic identification. The convexity of \(Q^k\) and \(h^k\) in Assumptions 19 and 22 allows for \(\{\partial G^k\}_{k=0}^\infty \) to converge in graph [13, Theorem 12.35]. This assumption will not prove restrictive in the integration of the present analysis with the convergence properties of Algorithm 1 even though the underlying problem has mixed-integer constraints.

Lemma 23

Let \(\{G^{k}: {\mathbb {X}}\times {\mathbb {Y}}\rightarrow \mathbb {R}_{+\infty }\}_{k=0}^\infty \) epi-converge to \(\overline{G}: {\mathbb {X}}\times {\mathbb {Y}}\rightarrow \mathbb {R}_{+\infty }\) satisfy Assumption 22. For \(\{({z}^{k},w^{k})\}_{k=0}^\infty \rightarrow (\overline{z},\overline{w})\) we have \(0 \notin \widehat{\partial }\overline{G}(\overline{z},\overline{w})\) if and only if \(\liminf _{k\rightarrow \infty } \inf _{(\zeta ,\omega ) \in G^{k}({z}^{k},w^{k})} \Vert (\zeta ,\omega )\Vert =\gamma > 0\).

Proof

Given that \(0 \not \in \partial \overline{G}(\bar{{z}},\bar{w})\), it follows from [13, Theorem 12.35(b)] that the sequence \(\{\widehat{\partial }G^{k}({z}^{k},w^{k})\}_{k=0}^\infty \) must be strictly bounded away from zero in that \(\liminf _{k\rightarrow \infty } \inf _{(\zeta ,\omega ) \in \partial G^{k}({z}^{k},w^{k})} \Vert (\zeta ,\omega )\Vert > 0\). \(\square \)

Assumption 24

Stationarity of \(w\) (\(w\)-stat) on the (sub)sequence indexed by \(k\): For each \(k\ge 1\), \(0 \in \widehat{\partial }_{w} G^{k}({z}^{k},w^{k})\).

Lemma 25

Assume \(\{G^{k}({z}^{k},w^{k})\}_{k=0}^\infty \rightarrow (\overline{G},\overline{z},\overline{w})\) satisfies the SA Assumption 22. If \(0 \notin \widehat{\partial }\overline{G}(\overline{z},\overline{w})\) and \(w\)-stat (Assumption 24 holds), then \(\Vert \nabla _{{z}} \overline{Q}(\overline{z},\overline{w})\Vert \ne 0\) and \(\liminf _{k\rightarrow \infty } \Vert \nabla _{z}Q^{k}({z}^{k},w^{k})\Vert = \gamma > 0\).

Proof

Under Assumption 24 (\(w\)-stat), we must have for the \(w\) subgradient components \( 0 \in \{ \nabla _wQ^{k}(\bar{{z}},\bar{w}) + \widehat{\partial }_wh^{k}(\bar{w}) \} \ne \emptyset \) for \(k\ge 0\), and so the hypothesis \(0 \notin \widehat{\partial }\overline{G}(\overline{z},\overline{w})\) and the calculus rules of Lemma 20 and Lemma 23 imply the intended result. \(\square \)

For the following results, we introduce an Armijo descent step rule for the \({z}\) step to aid in the convergence analysis.

Algorithm 2
figure b

Computing an Armijo rule step length \(\alpha > 0\).

Preconditions: \(\beta ,\sigma \in (0,1)\); G satisfies Assumption 19; d satisfies \(\nabla _{z}G({z},w)\,d< 0\).

Assumption 26

\({z}\)-Descent Assumption (\({z}\)-DA): Given \(\beta ,\sigma \in (0,1)\) and given subsequences \(\{d^{k}\}_{k=0}^\infty \), \(\{G^{k}\}_{k=0}^\infty \), and \(\{({z}^{k},w^{k})\}_{k=0}^\infty \) such that \(\nabla _{z}Q^{k}({z}^{k},w^{k}) d^{k} < 0\) for \(k\ge 1\), \({z}\)-DA is satisfied if

$$\begin{aligned} \lim _{k\rightarrow \infty } G^{k}({z}^{k} + \alpha ^{k}d^{k},w^{k}) - G^{k}({z}^{k},w^{k}) = 0 \end{aligned}$$

where \(\alpha ^k\) is computed with Algorithm 2 given \({z}={z}^{k}\) and \(w=w^{k}\), \(k\ge 1\).

The \({z}\)-DA Assumption 26 itself makes no assumption on how the sequence \(\{({z}^{k},w^{k}), d^{k}\}_{k=0}^\infty \) is constructed. Subsequently stated identifications of the sequence \(\{({z}^{k},w^{k})\}_{k=0}^\infty \) with subsequences generated by Algorithm 1 will guarantee the satisfaction of \({z}\)-DA Assumption 26 under mild assumptions on the implementation of Algorithm 1. The Armijo Step of Algorithm 2 is not actually used in our implementation of Algorithm 1. Rather, it is merely a theoretical tool in what follows.

The proof of the following lemma is based on ideas from [24, Proposition 1.2.1], [25, Proposition 3.2], and [26, Technical Lemmas, Appendix A].

Lemma 27

Assume that 1) \(\{G^{k}:{\mathbb {X}}\times {\mathbb {Y}}\rightarrow \mathbb {R}_{+\infty }\}_{k=0}^\infty \) epi-converging to \(\overline{G}:{\mathbb {X}}\times {\mathbb {Y}}\rightarrow \mathbb {R}_{+\infty }\) satisfies SA (Assumption 22); 2) \(\{(w^k,{z}^k)\}_{k=0}^\infty \) converges to \((\overline{w},\overline{z})\) and satisfies \(w\)-stat (Assumption 24); and 3) \(\nabla _{z}Q^k({z}^k,w^k) \ne 0\) for each \(k\ge 1\).

If, for some \(\beta ,\sigma \in (0,1)\), the \({z}\)-DA (Assumption 26) holds for \(d^k= -\nabla _{z}Q^k({z}^k,w^k)\), then \(0 \in \partial \overline{G}(\bar{{z}},\bar{w})\). Moreover, \(\overline{G}\) is regular at \((\bar{{z}},\bar{w})\) and so we have also \(0 \in \widehat{\partial }\overline{G}(\bar{{z}},\bar{w})\).

Proof

From (22), the satisfaction of \(w\)-stat Assumption 24, and Lemma 21, we only need to show that \(\lim _{k \rightarrow \infty } \Vert \nabla _{z}Q^k({z}^k,w^k) \Vert = 0\). Due to SA Assumption 22 and \({z}\)-DA Assumption 26, we have

$$\begin{aligned} 0 {=} \lim _{k\rightarrow \infty } G^{k}({z}^{k} {+} \alpha ^{k}d^{k},w^{k}) {-} G^{k}({z}^{k},w^{k})&= \lim _{k\rightarrow \infty } Q^{k}({z}^{k} + \alpha ^{k}d^{k},w^{k}) - Q^{k}({z}^{k},w^{k})\\&\le \lim _{k\rightarrow \infty } \alpha ^{k} \sigma \nabla _{{z}} Q^{k}({z}^{k},w^{k}) d^k\le 0. \end{aligned}$$

Thus, \( \lim _{k\rightarrow \infty } \alpha ^{k} \sigma \nabla _{{z}} Q^{k}({z}^{k},w^{k}) d^k= 0. \)

We consider two cases: 1) \(\limsup _{k\rightarrow \infty } \alpha ^{k} > 0\), and 2) \(\limsup _{k\rightarrow \infty } \alpha ^{k} = 0\). Due to the assumed continuity of \(\nabla _{{z}} Q^{k}\), and given that \(d^k= -\nabla _{{z}} Q^{k}({z}^k,w^k)\), the first case implies that

$$\begin{aligned} \lim _{k\rightarrow \infty } \nabla _{{z}} Q^{k}({z}^{k},w^{k}) d^k= \lim _{k\rightarrow \infty } -\Vert \nabla _{{z}} Q^{k}({z}^{k},w^{k}) \Vert ^2 = 0. \end{aligned}$$

and so \( \lim _{k\rightarrow \infty } \nabla _{{z}} Q^{k}({z}^{k},w^{k}) = 0. \) Otherwise, assuming that \(\limsup _{k \rightarrow \infty } \alpha ^{k} = \lim _{k\rightarrow \infty } \alpha ^{k} = 0\), we have for some large enough \(k\ge \bar{k} \ge 0\) that \(\alpha ^k\le \beta < 1\) and so we have \(\bar{\alpha }^k=\alpha ^k/\beta \le 1\)—that is, the state of \(\alpha ^k\) at the penultimate iteration of Algorithm 2—for which it holds that

$$\begin{aligned} G^{k}({z}^{k} + \bar{\alpha }^kd^k,w^{k}) - G^{k}({z}^{k},w^{k}) > \bar{\alpha }^k\sigma \nabla _{{z}} Q^{k}({z}^k,w^k) d^k, \end{aligned}$$

which implies

$$\begin{aligned} \frac{Q^{k}({z}^{k} + \bar{\alpha }^{k}d^k,w^{k}) - Q^{k}({z}^{k},w^{k})}{\bar{\alpha }^{k}} > \sigma \nabla _{{z}} Q^{k}({z}^k,w^k) d^k. \end{aligned}$$

Applying the Mean Value Theorem at each \(k\), we have

$$\begin{aligned} \nabla _{z}Q^{k}({z}^{k} + \tilde{\alpha }^kd^k,w^{k})d^k> \sigma \nabla _{{z}} Q^{k}({z}^k,w^k) d^k. \end{aligned}$$

for some \(\tilde{\alpha }^k\in [0,\bar{\alpha }^k]\). Using the continuity of \(\nabla _{{z}} Q^{k}\) (and the Cauchy-Schwartz inequality), we have for arbitrarily small \(\epsilon > 0\) that there exists \(\delta < 0\) where for large enough k, we have \(\tilde{\alpha }^k\le \bar{\alpha }^k< \delta \) so that

$$\begin{aligned} \epsilon \Vert d^k\Vert + \nabla _{z}Q^{k}({z}^{k},w^{k}) d^k>\nabla _{z}Q^{k}({z}^{k} + \tilde{\alpha }^kd^k,w^{k})d^k> \sigma \nabla _{{z}} Q^{k}({z}^{k},w^{k}) d^{k} \end{aligned}$$

holds for sufficiently large \(k\).

Recalling that \(d^{k} = -\nabla _{z} Q^{k}({z}^{k},w^{k}) \ne 0\), we then have

$$\begin{aligned} \epsilon \Vert d^{k}\Vert > (\sigma -1) \nabla _{z} Q^{k}({z}^{k},w^{k}) d^{k} = (1-\sigma ) \Vert \nabla _{z}Q^{k}({z}^{k},w^{k}) \Vert _2^2 \end{aligned}$$

and so \( \epsilon > (1-\sigma ) \Vert \nabla _{z}Q^{k}({z}^{k},w^{k}) \Vert _2 \) holds for sufficiently large \(k\). In the limit, we have \( 0 \ge (1-\sigma ) \Vert \nabla _{{z}} \overline{Q}(\bar{{z}},\bar{w}) \Vert _2, \) which is a contradiction since \((1-\sigma ) > 0\) and \(\Vert \nabla _{{z}} \overline{Q}(\bar{{z}},\bar{w}) \Vert _2 > 0\) as established from Lemma 25 and the SA Assumption 22. Thus, \(0 \in \partial \overline{G}(\bar{{z}},\bar{w})\), and since \(\overline{G}\) is regular, then \(0 \in \widehat{\partial }\overline{G}(\bar{{z}},\bar{w})\) holds also. \(\square \)

In order to apply Lemma 27 to a convergence analysis of Algorithm 1, we need to establish the satisfaction of the SA Assumption 22, \(w\)-stat Assumption 24, and especially the \({z}\)-DA Assumption 26 requiring \( \lim _{k\rightarrow \infty } G^{k}({z}^{k} + \alpha ^{k}d^{k},w^{k}) - G^{k}({z}^{k},w^{k}) = 0 \) given an appropriate identification with Algorithm 1 subsequence iterations \(\{n_k\}_{k=0}^\infty \).

5.2 Interleaving analysis and algorithm

We analyse subsequences \(\{(x^{n_k+1}, y^{n_k+1}, w^{n_k+1}, z^{n_k})\}_{k=0}^\infty \) from the iterations generated by the application of Algorithm 1 applied to problem (1) that converge to \((\overline{x},\overline{y},\overline{w},\overline{z})\). (Such limit points with respect to the entire sequence in \(n\) exist due to inf-compactness (1) that will be demonstrated in this subsection.) This analysis depends on establishing that the Sect. 5.1 assumptions hold under the appropriate identifications with Algorithm 1 (i.e., the SA Assumption 22, the \(w\)-stat Assumption 24, and the \({z}\)-DA Assumption 26).

Given the assumed subsequence convergence, we may take the subsequence \(\{(x^{n_k+1}, y^{n_k+1}, w^{n_k+1}, z^{n_k})\}_{k=0}^\infty \) so that integer component values for \(x_{\mathcal {I}}=\overline{x}_{\mathcal {I}}\) and \(y_{\mathcal {I}}=\overline{y}_{\mathcal {I}}\) are fixed. With respect to Algorithm 1, we apply the identifications given GS iterations indexed by \(n\ge 1\) and subsequence iterations indexed with \(n_k\), \(k\ge 1\):

Assumption 28

Algorithm 1 Identifications:

  1. 1.

    Variables: \({z}^k\leftarrow z^{n_k}\) and \(w^k\leftarrow (x_s^{n_k+1},y_s^{n_k+1},w_s^{n_k+1})_{s \in S}\)

  2. 2.

    Expressions: \(Q^k({z}^k,w^k) \leftarrow \sum _{s \in S} \pi _s^{n_k} \psi (z^{n_k}-x_s^{n_k+1},w_s^{n_k+1}-y_s^{n_k+1})\) and

    $$\begin{aligned} h^k(w^k) \leftarrow \sum _{s \in S} \left[ \frac{1}{\rho ^{n_k}} \left( f_s\left( x_s^{n_k+1}, y_s^{n_k+1} \right) - (\lambda _s^{n_k})^\top x_s^{n_k+1} \right) + \delta _{K_s^{ (\overline{{\textbf{x}}}_{{\mathcal {I}}},\overline{{\textbf{y}}}_{{\mathcal {I}}})} } (x_s^{n_k+1},y_s^{n_k+1}) \right] \end{aligned}$$

Recalling \(f^n(\varvec{x},\varvec{y},\varvec{w},z) = \sum _{s\in S} f_s(x_s,y_s) + (\lambda _s^n)^\top x_s + \rho ^n\pi _s^n\psi (z- x_s, w_s - y_s )\) defined in (17), define for each \(n\ge 0\)

$$\begin{aligned} g^{n}(\varvec{x},\varvec{y},\varvec{w},z) := \frac{1}{\rho ^{n}} f^n(\varvec{x},\varvec{y},\varvec{w},z). \end{aligned}$$
(23)

we have also

$$\begin{aligned} G^{k}({z}^k,w^k)&\leftarrow g^{n_k}(\varvec{x}^{n_k+1},\varvec{y}^{n_k+1},\varvec{w}^{n_k+1},z^{n_k}) +\sum _{s \in S} \delta _{K_s^{ (\overline{{\textbf{x}}}_{{\mathcal {I}}},\overline{{\textbf{y}}}_{{\mathcal {I}}})} } (x_s^{n_k+1},y_s^{n_k+1})\\&= g^{n_k}(\varvec{x}^{n_k+1},\varvec{y}^{n_k+1},\varvec{w}^{n_k+1},z^{n_k}) \end{aligned}$$

where the last equality is by construction (fixed integral values) of the subsequence.

Furthermore, to guarantee the set of assumptions: SA, \(w\)-Stat, and \({z}\)-DA, we assume the following of the GS (sub)sequences:

Assumption 29

Algorithm Assumptions: In the application of Algorithm 1 to problem (1), the following hold:

  1. 1.

    SMIP assumptions: Assumption 1 holds for problem (1).

  2. 2.

    Penalty function assumptions: \(\psi \) satisfies the Assumption 2. Furthermore, we subsequently note special implications that hold in the cases where \(\psi \) takes the weighted squared 2-norm form with weights \(\bar{\mu }_i > 0\) such that

    $$\begin{aligned} \psi (z-x_s,0) = \frac{1}{2} \sum _{s \in S} \sum _{i=1}^n \bar{\mu }_i(z_i-x_{s,i})^2. \end{aligned}$$
    (24)
  3. 3.

    Global optimality: Each \((\varvec{x}^{n+1},\varvec{y}^{n+1},\varvec{w}^{n+1})\) is globally optimal given fixed \(z^{n}\) in that

    $$\begin{aligned} (\varvec{x}^{n+1},\varvec{y}^{n+1},\varvec{w}^{n+1}) \in \arg \min _{x,y,w} f^n(\varvec{x},\varvec{y},\varvec{w},z^n) \end{aligned}$$

    (hence limit points \((\overline{\varvec{x}},\overline{\varvec{y}},\overline{\varvec{w}})\) are globally optimal given fixed limit point \(\overline{z}\) by known results [27, Propositions 1.3.5 and 1.3.6]). Also, \(z^{n+1} \in {{\,\textrm{argmin}\,}}_{z} \sum _{s \in S} \pi _s^n\psi (z-x_s^{n+1},0)\), \(n\ge 1\), are globally optimal. Furthermore, under the additional assumption that \(\psi \) is of the weighted squared 2-norm form (24), we have (independent of weights \(\bar{\mu }_i > 0\), \(i=1,\dots ,n\))

    $$\begin{aligned} z^{n+1}&\leftarrow \sum _{s \in S} \pi _s^nx_s^{n+1} \in \arg \min _{z} \sum _{s \in S} \pi _s^n\psi (z-x_s^{n+1},0). \end{aligned}$$
  4. 4.

    Generation of Lagrange multipliers: \(\lambda _s^0=0\) for \(s \in S\). If \(\psi \) is not of the weighted squared 2-norm form (24), then we may assume \(\lambda _s^n\equiv 0\), \(n\ge 1\), identically. Otherwise, if \(\psi \) is of the squared 2-norm form (24), For each subsequent iteration \(n\), either \(\lambda _s^n\) is left unchanged via \(\lambda _s^{n+1} \leftarrow \lambda _s^{n}\) for all \(s \in S\), or

    $$\begin{aligned} \lambda _{s,i}^{n+1}&\leftarrow \lambda _{s,i}^{n} - \rho ^n\pi _s^n\bar{\mu }_i\left( z_i^{n+1} -x_{s,i}^{n+1} \right) \quad \text {for all}\; s \in S. \end{aligned}$$

    Under these assumptions on \(\lambda \), it follows that for \(n\ge 0\), we have vanishing sums \(\sum _{s \in S} \lambda _s^n= 0\). (i.e., dual feasibility maintained and the absence of the \(\sum _{s \in S}\lambda _s^nz\) terms in the Lagrangian is thus justified.) Non-trivial \(\lambda \) updates between iterations are suppressed as necessary to ensure in the limit that \(\sum _{n=1}^\infty \Vert \lambda _s^n-\lambda _s^{n+1}\Vert < \infty \) hold. (In practice this will usually entail only a finite number of nontrivial updates.)

  5. 5.

    Update of penalty parameters: We assume the following.

    1. (a)

      Penalty coefficients are nondecreasing \(\rho ^{n+1} \ge \rho ^{n} > 0\), \(n\ge 0\).

    2. (b)

      \( 0 < \pi _s^{n}\rho ^{n} \le \pi _s^{n+1}\rho ^{n+1}\), \(s \in S\), \(n\ge 0\). (\(\pi _s^{n} \le \pi _s^{n+1}\) does not hold in general.)

    3. (c)

      For each \(n \ge 0\) and \(s \in S\), we have \(\pi _s^n> 0\) and \(\sum _{s \in S} \pi _s^n= 1\), and \(\{\pi _s^{n}\}_{n=0}^\infty \) converges to \(\pi _s > 0\) for all \(s \in S\) such that \(\sum _{n=1}^{\infty } |\pi _s^{n+1}-\pi _s^{n} |< \infty \). Initially, \(\pi _s^0 \leftarrow p_s\).

The algorithm does not necessarily adjust \(\pi _s^n\) and \(\rho ^n\) parameters separately. Instead, it may apply penalty updates in a scenario-specific manner to \(\rho _s^n:=\pi _s^n\rho ^n\) with \(\rho ^n = \sum _{s \in S} \rho _s^n\). Under the condition that \(s \in S_n:= \{s \in S \mid z_{{\mathcal {I}}} \ne x_{s,{\mathcal {I}}} \}\), we place \(\rho ^{n+1}_s = \gamma _n \rho ^n_s\) with \(\gamma _n >1\).

Lemma 30

Under the Algorithmic Identifications 28 with \(G^k= h^k+ Q^k\) and Assumption 29, we have

  1. 1.

    \(\{ G^k\}_{k=0}^\infty \) epi-converges to \(\overline{G}= \overline{h}+ \overline{Q}\) and satisfies the SA Assumption 22.

  2. 2.

    \(0 \in \partial _wG^k({z}^k,w^k)\) and \(0 \in \partial _w\overline{G}(\overline{z},\overline{w})\).

Proof

By regularity of \(\overline{G}\) due to its convexity, the (limiting) Mordukhovich and Fréchet subdifferentials coincide.

We argue that we have epi-convergence of \(\{ G_{k} \}_{k=0}^\infty \) to \(\overline{G}\), whenever \(\{ (\pi _s^{n_k})_{s \in S} \}_{k=0}^\infty \) converges to \(\{\pi _s\}_{s \in S}\). As

$$\begin{aligned} h^k( w):= \frac{ 1}{\rho ^{n_k}} \left( \sum _{s \in S} p_s \{c^{\top }x_s + d^{\top }_s y_s \} - (\lambda _s^{n_k} )^{\top } x_s \right) + \delta _{K^{ (\overline{{\textbf{x}}}_{{\mathcal {I}}},\overline{{\textbf{y}}}_{{\mathcal {I}}})} } (x,y). \end{aligned}$$

we have \(h^k\) convex and converging both monotonically point-wise and uniformly to \(\delta _{K^{(\overline{x}_{\mathcal I},\overline{y}_{\mathcal I}) }}\). This is because we have uniform convergence to zero of \( \frac{ 1}{\rho ^{n_k}} \left( \sum _{s \in S} p_s \{c^{\top }x_s + d^{\top }_s y_s \} - (\lambda _s^{n_k} )^{\top } x_s \right) \) on the compact and convex polyhedral set \(K^{(\overline{x}_{\mathcal I},\overline{y}_{\mathcal I})}\). Thus we have epi-convergence of \(\{h^k\}_{k=0}^\infty \) to \(\delta _{ K^{(\overline{x}_{\mathcal I},\overline{y}_{\mathcal I}) }}\). Whenever \(\{ (\pi _s^{n_k})_{s \in S} \}_{k=0}^\infty \) converges to \(\pi _s\), for \(s \in S\) we then have a family of convex functions \(\{Q^{k}:= \sum _{s \in S} \pi _s^{n_k} \psi \}_{k=0}^\infty \), uniformly converging on compact sets and hence also epi-convergent to \(\overline{Q}= \sum _{s \in S} \pi _s \psi _s (\cdot , \cdot )\). Applying [27, Theorem 7.1.5] or [13, Theorem 7.46], we know that the sum of a uniformly convergent sequence and an epi-convergent sequence must epi-converge. Thus we may deduce that we have \(\{ G^k:= h^k+ Q^k\}_{k=0}^\infty \) epi-converges to \(\overline{G}= \overline{h}+ \overline{Q}\). Now we may apply [13, Theorem 12.35] to deduce that the convex subdifferentials \(\{\partial _wG^k\}_{k=0}^\infty \) converge in graph to \( \partial _w\overline{G}\). As \(w^k\) is a global minimiser for \(w\mapsto G^k({z}^k,w)\), then we have (by definitions) that \(0 \in \partial _wG^k({z}^k,w^k)\) and hence also over the subsequence indexed by \(k\). Thus by graphical convergence \(0 \in \partial _w\overline{G}(\overline{z},\overline{w})\). \(\square \)

Definition 6

Let intervening GS iterations between \(n_k\) and \(n_{k+1}\) be denoted \(n_k+1,n_k+2,\dots ,n_{k+1}-2,n_{k+1}-1\), etc.

Lemma 31

If \((\varvec{x}^n,\varvec{y}^n,\varvec{w}^n,z^n)\), \(n\ge 1\), are computed with the GS iterations of Algorithm 1, then for each fixed \(n\ge 1\) and positive integer j, we have

$$\begin{aligned}&g^{n+j}(\varvec{x}^{n+j+1}, \varvec{y}^{n+j+1}, \varvec{w}^{n+j+1}, z^{n+j}) - g^{n}(\varvec{x}^{n+1}, \varvec{y}^{n+1}, \varvec{w}^{n+1}, z^{n}) \\&\qquad +\sum _{i=0}^{j-1} \left[ g^{n+i}(\varvec{x}^{n+i+1},\varvec{y}^{n+i+1},\varvec{w}^{n+i+1}, z^{n+i+1})\right. \\&\qquad \left. -g^{n+i+1}(\varvec{x}^{n+i+1},\varvec{y}^{n+i+1},\varvec{w}^{n+i+1}, z^{n+i+1}) \right] \\&\quad \le g^{n}(\varvec{x}^{n+1}, \varvec{y}^{n+1}, \varvec{w}^{n+1}, z^{n}+\alpha ^nd^n) - g^{n}(\varvec{x}^{n+1}, \varvec{y}^{n+1}, \varvec{w}^{n+1}, z^{n}) \\ \end{aligned}$$

where \(z^{n+i+1} \in \arg \min _z\{ g^{n+i}(\varvec{x}^{n+i+1},\varvec{y}^{n+i+1}, \varvec{w}^{n+i+1}, z)\) } for \(i=0,\dots ,j-1\) as is consistent with an iteration of Algorithm 1.

Proof

See Appendix A. \(\square \)

Corollary 32

Given \(n_k\), \(k\ge 0\), a subsequence index, and \(j_k\) the positive integer such that \(n_k+j_k= n_{k+1}\), if \((\varvec{x}^{n+1},\varvec{y}^{n+1},\varvec{w}^{n+1},z^n)\), \(n\ge 1\), are computed with GS iterations, then

$$\begin{aligned} \sum _{i=0}^{j_{k}-1}&\left[ g^{n_k+i}(\varvec{x}^{n_k+i+1},\varvec{y}^{n_k+i+1},\varvec{w}^{n_k+i+1},z^{n_k+i+1}) \right. \\&\left. -g^{n_k+i+1}(\varvec{x}^{n_k+i+1},\varvec{y}^{n_k+i+1},\varvec{w}^{n_k+i+1},z^{n_k+i+1}) \right] \\&+G^{k+1}({z}^{k+1}, w^{k+1}) -G^{k}({z}^{k}, w^{k}) \le \;G^{k}({z}^k+\alpha ^kd^k,w^k)-G^{k}({z}^k, w^k)&\end{aligned}$$

Proof

By the construction of the subsequence, \(\delta _{K_s^{ (\overline{{\textbf{x}}}_{s,{\mathcal {I}}},\overline{{\textbf{y}}}_{s,{\mathcal {I}}})} } (x_s^{n_k+1},y_s^{n_k+1}) = 0\) and so any potential discrepancy between \(g^{n_k}(\varvec{x}^{n_k+1},\varvec{y}^{n_k+1},\varvec{w}^{n_k+1},z^{n_k})\) and \(G^{k}({z}^k,w^k)\) is avoided. \(\square \)

Lemma 33

Under Assumption 28 and Assumption 29, we have

$$\begin{aligned} \lim _{k\rightarrow \infty }&\sum _{i=0}^{j_{k}-1} \left[ g^{n_k+i}(\varvec{x}^{n_k+i+1},\varvec{y}^{n_k+i+1},\varvec{w}^{n_k+i+1},z^{n_k+i+1}) \right. \\&\left. \quad -g^{n_k+i+1}(\varvec{x}^{n_k+i+1},\varvec{y}^{n_k+i+1},\varvec{w}^{n_k+i+1},z^{n_k+i+1}) \right] = 0. \end{aligned}$$

Proof

See Appendix A. \(\square \)

Corollary 34

Assume the Algorithm Identifications 28. If the Algorithm Assumptions 29 hold, then \({z}\)-DA Assumption 26 holds under any allowable realisation of its assumptions on \(\beta ,\sigma \), \(d^k\), etc. (Thus, the intended \({z}\)-DA condition will hold for any convergent subsequence \(\{({z}^k,w^k)\}_{k=0}^\infty =\{(\varvec{x}^{n_k+1},\varvec{y}^{n_k+1},\varvec{w}^{n_k+1},z^{n_k})\}_{k=0}^{\infty }\) with \(d^k = -\nabla _{{z}} Q^k({z}^{k},w^{k})\), and \(\alpha ^k\) computed with the Armijo rule for any \(\beta ,\sigma \in (0,1)\).)

Proof

Given that \(G^{k}({z}^k+\alpha ^kd^k,w^k)-G^{k}({z}^k, w^k) \le 0\) already holds per the Armijo step, the satisfaction of \({z}\)-DA Assumption 26 follows from Lemma 33 and Corollary 32 once it is noted that \(\lim _{k\rightarrow \infty } G^{k+1}({z}^{k+1}, w^{k+1}) -G^{k}({z}^{k}, w^{k}) = 0 \) follows from the SA epi-convergence of Assumption 22 [13, Theorem 12.35]. \(\square \)

Definition 7

Under the epi-convergence of SA Assumption 22, we define the limiting regularisation

$$\begin{aligned} \phi ^{\infty }(z,\varvec{w}) := \lim _{\rho \rightarrow \infty } \frac{1}{\rho }\varphi ^{\lambda ,\rho ,\pi }(z,\varvec{w}) = \min _{x,y} \left\{ \sum _{s \in S} \pi _s \psi (z-x_s,w_s-y_s) \mid (x_s,y_s) \in K_s\right\} \end{aligned}$$

We now state one of our main results. Before doing so, we denote the following

Definition 8

To accommodate both possibilities \(\lim _{n\rightarrow \infty } \rho ^n= \bar{\rho } < \infty \) or \(\lim _{n\rightarrow \infty } \rho ^n= \infty \) disjunctively, we define

$$\begin{aligned} g^*(\varvec{x},\varvec{y},\varvec{w},z):= & {} \lim _{n\rightarrow \infty } g^{n}(\varvec{x},\varvec{y},\varvec{w},z)\nonumber \\= & {} \lim _{n\rightarrow \infty } \sum _{s \in S} \left[ \frac{1}{\rho ^{n}} \left( f_s(x_s, y_s ) - (\lambda _s^{n})^\top x_s \right) + \pi _s^{n} \psi _s(z-x_s,w_s-y_s) \right] \nonumber \\ \end{aligned}$$
(25)

(recalling the definition (23)). From (25), we define the limiting regularisation

$$\begin{aligned} \phi ^{*}(z,\varvec{w}):= & {} \lim _{n\rightarrow \infty } \frac{1}{\rho ^n} \varphi ^n(z,\varvec{w}) = \min _{x,y} g^*(\varvec{x},\varvec{y},\varvec{w},z)\\ \phi ^{*}\left( z,\varvec{w} \mid \overline{\varvec{x}}_{{\mathcal {I}}},\overline{\varvec{y}}_{{\mathcal {I}}}\right):= & {} \min _{x,y} g^*(\varvec{x},\varvec{y},\varvec{w},z)+ \delta _{K^{ (\overline{{\textbf{x}}}_{{\mathcal {I}}},\overline{{\textbf{y}}}_{{\mathcal {I}}})} } (\varvec{x},\varvec{y}) \end{aligned}$$

The corresponding set of solutions \((\varvec{x},\varvec{y})\) realising these values given \((z,\varvec{w})\) is denoted

$$\begin{aligned} \Phi ^{*}(z,\varvec{w}):= & {} \arg \min _{\varvec{x},\varvec{y}} g^*(\varvec{x},\varvec{y},\varvec{w},z)\\ \Phi ^{*}\left( z,\varvec{w} \mid \overline{\varvec{x}}_{{\mathcal {I}}},\overline{\varvec{y}}_{{\mathcal {I}}}\right):= & {} \arg \min _{\varvec{x},\varvec{y}} g^*(\varvec{x},\varvec{y},\varvec{w},z)+ \delta _{K^{ (\overline{{\textbf{x}}}_{{\mathcal {I}}},\overline{{\textbf{y}}}_{{\mathcal {I}}})} } (\varvec{x},\varvec{y}). \end{aligned}$$

Proposition 35

Let \((\overline{\varvec{x}},\overline{\varvec{y}},\overline{\varvec{w}},\overline{z})\) satisfy \((\overline{\varvec{x}},\overline{\varvec{y}}) \in K\). The following implications hold.

  1. 1.

    If the Fréchet stationarity \(0 \in \widehat{\partial }g^*(\overline{\varvec{x}},\overline{\varvec{y}},\overline{\varvec{w}},\overline{z})\) holds, then \((0,0) \in \widehat{\partial }\phi ^{*}\left( \overline{z},\overline{\varvec{w}} \mid \overline{\varvec{x}}_{\mathcal {I}},\overline{\varvec{y}}_{\mathcal {I}}\right) \) so that \((\overline{z},\overline{\varvec{w}})\) is a minimum of \(\phi ^{*}\left( \overline{z},\overline{\varvec{w}} \mid \overline{\varvec{x}}_{\mathcal {I}},\overline{\varvec{y}}_{\mathcal {I}}\right) \).

  2. 2.

    \((\overline{z},\overline{\varvec{w}})\) is a local minimum of \(\phi ^{*}\) with \((0,0) \in \widehat{\partial }\phi ^{*}(\overline{z},\overline{\varvec{w}}) =\nabla \phi ^{*}(\overline{z},\overline{\varvec{w}})\) if and only if \((\overline{z},\overline{\varvec{w}})\) is a local minimum of \(\phi ^{*}\left( \cdot ,\cdot \mid \varvec{x}_{\mathcal {I}},\varvec{y}_{\mathcal {I}}\right) \) with \((0,0) \in \widehat{\partial }\phi ^{*}\left( \overline{z},\overline{\varvec{w}} \mid \varvec{x}_{\mathcal {I}},\varvec{y}_{\mathcal {I}}\right) = \partial \phi ^{*}\left( \overline{z},\overline{\varvec{w}} \mid \varvec{x}_{\mathcal {I}},\varvec{y}_{\mathcal {I}}\right) \) for all \((\varvec{x}_{\mathcal {I}},\varvec{y}_{\mathcal {I}}) \in {{\,\textrm{proj}\,}}_{{\mathcal {I}}}(\Phi ^{*}(\overline{z},\overline{\varvec{w}}))\).

  3. 3.

    If \(\overline{z}= \overline{x}_s\) for all \(s \in S\), then \((0,0) \in \widehat{\partial }\phi ^{*}(\overline{z},\overline{\varvec{w}})\) and so \((0,0) = \nabla \phi ^{*}(\overline{z},\overline{\varvec{w}})\). Furthermore, \((\overline{z},\overline{\varvec{w}})\) is also a (persistent) local minimum for \(\phi ^{*}\).

  4. 4.

    In the specific case where \(\lim _{n\rightarrow \infty } \rho ^n= \infty \) (so that \(\phi ^{*}=\phi ^{\infty }\)), the reverse of the previous implication also holds, where \((\overline{z},\overline{\varvec{w}})\) being a local minimum for \(\phi ^{*}\) with \((0,0) \in \widehat{\partial }\phi ^{*}(\overline{z},\overline{\varvec{w}})\) implies that \(\overline{z}= \overline{x}_s\), \(s \in S\).

Proof

Part 1: Given \(0 \in \widehat{\partial }g^*(\overline{\varvec{x}},\overline{\varvec{y}},\overline{\varvec{w}},\overline{z})\) and the structure of \(g^*\) as the sum of a linear function and an indicator function for a polyhedral set with integer cross-sections, we have that \(g^*(\varvec{x},\varvec{y},\varvec{w},z) \ge g^*(\overline{\varvec{x}},\overline{\varvec{y}},\overline{\varvec{w}},\overline{z})\) for all \((\varvec{x},\varvec{y}) \in K^{(\overline{\varvec{x}}_{\mathcal {I}},\overline{\varvec{y}}_{\mathcal {I}})}\) and \((z,\varvec{w}) \in {\mathbb {X}}\times {\mathbb {Y}}\) and more particularly, \(g^*(\varvec{x},\varvec{y},\overline{\varvec{w}},\overline{z}) \ge g^*(\overline{\varvec{x}},\overline{\varvec{y}},\overline{\varvec{w}},\overline{z})\) for all \((\varvec{x},\varvec{y}) \in K^{(\overline{\varvec{x}}_{\mathcal {I}},\overline{\varvec{y}}_{\mathcal {I}})}\) Thus,

$$\begin{aligned} g^*(\overline{\varvec{x}},\overline{\varvec{y}},\overline{\varvec{w}},\overline{z}) = \phi ^{*}\left( \overline{z},\overline{\varvec{w}} \mid \overline{\varvec{x}}_{\mathcal {I}},\overline{\varvec{y}}_{\mathcal {I}}\right) . \end{aligned}$$

Furthermore, since \(g^*(\varvec{x},\varvec{y},\varvec{w},z) \ge g^*(\overline{\varvec{x}},\overline{\varvec{y}},\overline{\varvec{w}},\overline{z})\) for all \((\varvec{x},\varvec{y}) \in K^{(\overline{\varvec{x}}_{\mathcal {I}},\overline{\varvec{y}}_{\mathcal {I}})}\) and \((z,\varvec{w}) \in {\mathbb {X}}\times {\mathbb {Y}}\), we have

$$\begin{aligned} \phi ^{*}\left( z,\varvec{w} \mid \overline{\varvec{x}}_{\mathcal {I}},\overline{\varvec{y}}_{\mathcal {I}}\right) \ge \phi ^{*}\left( \overline{z},\overline{\varvec{w}} \mid \overline{\varvec{x}}_{\mathcal {I}},\overline{\varvec{y}}_{\mathcal {I}}\right) \end{aligned}$$

and so by the convexity of \(\phi ^{*}\left( \cdot ,\cdot \mid \overline{\varvec{x}}_{\mathcal {I}},\overline{\varvec{y}}_{\mathcal {I}}\right) \), we have also that \((0,0) \in \widehat{\partial }\phi ^{*}\left( \overline{z},\overline{\varvec{w}} \mid \overline{\varvec{x}}_{\mathcal {I}},\overline{\varvec{y}}_{\mathcal {I}}\right) \).

Part 2: We have \((0,0) \in \widehat{\partial }\varphi ^{*} (\overline{z},\overline{\varvec{w}}\mid \varvec{x}_{\mathcal {I}},\varvec{y}_{\mathcal {I}})\) (in both the Fréchet and classical sense) due to \((\overline{z},\overline{\varvec{w}}) \in \arg \min _{z,\varvec{w}} \phi ^{*}(z,\varvec{w}\mid \varvec{x}_{\mathcal {I}},\varvec{y}_{\mathcal {I}}) \) and as \(\phi ^{*}(z,\varvec{w}) = \min _{(\varvec{x}_{\mathcal {I}},\varvec{y}_{\mathcal {I}}) \in \text {proj}_{{\mathcal {I}}}(K)} \phi ^{*}({z,\varvec{w}}\mid {\varvec{x}_{\mathcal {I}},\varvec{y}_{\mathcal {I}}})\) where each \((z,\varvec{w}) \mapsto \phi ^{*}({z,\varvec{w}}\mid {\varvec{x}_{\mathcal {I}},\varvec{y}_{\mathcal {I}}})\) is convex, we may invoke Lemma 11 Part 2 to obtain both directions of the implication after identifying \(\varphi \) with \(\phi ^{*}\) and \(\varphi _i\), \(i \in I\), with \(\phi ^{*}\left( \cdot ,\cdot \mid \varvec{x}_{\mathcal {I}},\varvec{y}_{\mathcal {I}}\right) \) for \((\varvec{x}_{\mathcal {I}},\varvec{y}_{\mathcal {I}}) \in {{\,\textrm{proj}\,}}_{\mathcal {I}}(\Phi ^{*}(z,\varvec{w}))\).

Part 3: The fact that \(\overline{z}= \overline{x}_s\) for all \(s \in S\) implies by Corollary 8 that \(\phi ^{*}(\overline{z},\overline{\varvec{w}}) =\phi ^{*}\left( \overline{z},\overline{\varvec{w}} \mid \varvec{x}_{\mathcal {I}},\varvec{y}_{\mathcal {I}}\right) \) for just one \((\varvec{x}_{\mathcal {I}},\varvec{y}_{\mathcal {I}}) =(\overline{\varvec{z}}_{\mathcal {I}},\overline{\varvec{w}}_{\mathcal {I}})\) only, and so the claim follows. The persistency follows from the fact that inequality (11) implies a bound independent of \(\rho \).

Part 4: Knowing that \((\overline{z},\overline{\varvec{w}})\) is a local minimum for \(\phi ^{\infty }\), we form the cleared instance of the SMIP (1) by clearing first- and second-stage coefficients \(c=d=0\), and for all \(n\ge 0\), clearing \(\lambda ^n=0\) and setting \(\pi ^n= \pi \). Thus, for all \(\rho > 0\), we have that \(\frac{1}{\rho }\varphi ^{\lambda ,\rho ,\pi }\equiv \phi ^{\infty }\) and so \((\widetilde{z}^n,\widetilde{\varvec{w}}^n) \equiv (\overline{z},\overline{\varvec{w}})\), \(n\ge 0\), forms a sequence with limit \((\overline{z},\overline{\varvec{w}})\equiv (\widetilde{z}^n,\widetilde{\varvec{w}}^n)\), \(n\ge 0\). Each \((\widetilde{z}^n,\widetilde{\varvec{w}}^n)\equiv (\overline{z},\overline{\varvec{w}})\) is a local minimum for \(\varphi ^n\) over a fixed neighbourhood \(B_\delta (\overline{z},\overline{\varvec{w}})\) for some fixed \(\delta = \bar{\delta } > 0\), and since the SSA Assumption 9 and the PWA Assumption 10 therefore hold, by Proposition 17 applied to this sequence associated with this cleared instance of SMIP (1), we have \(\overline{z}\in X\) and \(\overline{\varvec{w}}\in Y(\overline{z})\), which applies with respect to the original (non-cleared) instance of SMIP (1) also. \(\square \)

Theorem 36

Assume that problem (1) satisfies the SMIP Assumption 1, to which Algorithm 1 is applied to generate a sequence \(\{(\varvec{x}^n,\varvec{y}^n,\varvec{w}^n,z^n)\}_{n=0}^\infty \). If the Algorithm Assumption 29 is satisfied, then there exists a limit point \((\overline{\varvec{x}},\overline{\varvec{y}},\overline{\varvec{w}},\overline{z})\) of the mid-iteration sequence \(\{(\varvec{x}^{n+1},\varvec{y}^{n+1},\varvec{w}^{n+1}, z^{n}) \}_{n=0}^\infty \), and each such limit point \((\overline{\varvec{x}},\overline{\varvec{y}},\overline{\varvec{w}},\overline{z})\) is a Fréchet stationary point for the problem

$$\begin{aligned} \min _{z,\varvec{x},\varvec{y},\varvec{w}} g^*(\varvec{x},\varvec{y},\varvec{w},z) \end{aligned}$$
(26)

and in either limiting case, the cross-sectional optimality \((\overline{z},\overline{\varvec{w}}) \in \arg \min _{z,\varvec{w}} \phi ^{*}\left( z,\varvec{w} \mid \overline{\varvec{x}}_{\mathcal {I}},\overline{\varvec{y}}_{\mathcal {I}}\right) \) holds. Thus, the following implications hold:

  1. 1.

    \((\overline{z},\overline{\varvec{w}})\) is a local minimum of \(\phi ^{*}\) if and only if \((\overline{z},\overline{\varvec{w}})\) is a local minimum of \(\phi ^{*}\left( \cdot ,\cdot \mid \varvec{x}_{\mathcal {I}},\varvec{y}_{\mathcal {I}}\right) \) for all \((\varvec{x}_{\mathcal {I}},\varvec{y}_{\mathcal {I}}) \in {{\,\textrm{proj}\,}}_{\mathcal {I}}(\Phi ^{*}(\overline{z},\overline{\varvec{w}}))\).

  2. 2.

    If \(\overline{z}= \overline{x}_s\) for all \(s \in S\) (so that \((\overline{\varvec{x}},\overline{\varvec{y}})\) is feasible and locally optimal for SMIP (1)), then \((\overline{z},\overline{\varvec{w}})\) is a (persistent) local minimum of \(\phi ^{*}\).

  3. 3.

    In the specific case where \(\lim _{n\rightarrow \infty } \rho ^n= \infty \) (so that \(\phi ^{*}=\phi ^{\infty }\)), the reverse of the previous implication also holds, where \((\overline{z},\overline{\varvec{w}})\) being a local minimum of \(\phi ^{*}\) implies that \(\overline{z}= \overline{x}_s\), \(s \in S\), so that \((\overline{\varvec{x}},\overline{\varvec{y}})\) is feasible and locally optimal for SMIP (1).

Proof

Under the SMIP Assumption that \(K_s\), \(s \in S\), are compact and penalty \(\psi \) satisfies Assumptions 2, it follows that the level sets of g are compact, and so the sequence \(\{(\varvec{x}^{n+1},\varvec{y}^{n+1},\varvec{w}^{n+1},z^n)\}_{n=0}^\infty \) will have limit points \((\overline{\varvec{x}},\overline{\varvec{y}},\overline{\varvec{w}},\overline{z})\) to which an associated subsequence \(\{(\varvec{x}^{n_k+1},\varvec{y}^{n_k+1},\varvec{w}^{n_k+1},z^{n_k})\}_{k=0}^\infty \) converges. For \(k\) large enough, we have \((\varvec{x}_{{\mathcal {I}}}^{n_k+1},\varvec{y}_{{\mathcal {I}}}^{n_k+1},\varvec{w}_{{\mathcal {I}}}^{n_k+1}) = (\overline{\varvec{x}}_{{\mathcal {I}}},\overline{\varvec{y}}_{{\mathcal {I}}},\overline{\varvec{w}}_{{\mathcal {I}}})\) becoming fixed. Therefore, it is only the \(z^{n_k}\) and the real-valued components \((\varvec{x}_{{\mathcal {R}}}^{n_k+1},\varvec{y}_{{\mathcal {R}}}^{n_k+1},\varvec{w}_{\mathcal {R}}^{n_k+1} )\) that are still changing throughout the (sub)sequence tail. After passing to a convergent subsequence with integer components fixed, the required SA Assumption 22 and the \(w\)-stat Assumption 24 applies to Assumption 28 identified with \(\{(G^k, {z}^k,w^k)\}_{k=0}^\infty \) by Lemma 30. Under the same assumptions, the \({z}\)-DA Assumption 26 is satisfied due to Corollary 34. Thus, Lemma 27 may be applied to the Assumption 28 identification sequence \(\{(G^k, {z}^k,w^k)\}_{k=0}^\infty \) to establish the stationarity properties which, after dereferencing the identifications back to the Algorithm 1 context, yield intended results.

The cross-sectional optimality \((\overline{z},\overline{\varvec{w}}) \in \arg \min _{z,\varvec{w}} \phi ^{*}\left( z,\varvec{w} \mid \overline{\varvec{x}}_{\mathcal {I}},\overline{\varvec{y}}_{\mathcal {I}}\right) \) holds by Proposition 35 Part 1. The proof of the three implications follows, respectively, from implications 2–4 of Proposition 35. \(\square \)

From Theorem 36 we know that the GS limit points \((\overline{z},\overline{\varvec{w}})\) will be optimal for at least one cross-section \(\varphi ^{\lambda ,\rho ,\pi }\left( \cdot ,\cdot \mid \varvec{x}_{\mathcal {I}},\varvec{y}_{\mathcal {I}}\right) \) or \(\phi ^{\infty }\left( \cdot ,\cdot \mid \varvec{x}_{\mathcal {I}},\varvec{y}_{\mathcal {I}}\right) \).

A simple example demonstrates the possibility that the above GS procedure produces a limit point \((\overline{z},\overline{\varvec{w}})\) for which \(\widehat{\partial }\phi ^{\infty }(\overline{z},\overline{\varvec{w}})\) is empty.

Example 2

We revisit a rescaled version of the augmented Lagrangian problem of (20) defined for Example 1, where the objective function is rescaled by a factor of \(\frac{1}{\rho }\).

 

Cross-Section

Gradient w.r.t \( z\)

Value

Loc. opt. over \( B_\delta (\overline{z},\overline{\varvec{w}})\)

\( \Phi ^{*}\)

\( \varvec{x}_{\mathcal {I}}\)

\( \frac{1}{\rho ^n}\inf _w\varphi ^{n}\left( z,\varvec{w} \mid \varvec{x}_{\mathcal {I}}\right) \)

\( \widehat{\partial }_z\varphi ^n(z,\varvec{w})\)

\( \frac{1}{\rho }\varphi ^n(\widetilde{z}^n,\widetilde{\varvec{w}}^n)\)

\( (\widetilde{z}^n,\widetilde{\varvec{w}}^n)\)

\( \delta \)

\( (z^*,\varvec{w}^*)\)

\( \left[ \begin{array}{c} 0 \\ 0 \end{array}\right] \)

\(\frac{1}{\rho ^n}+\Vert z\Vert ^2\)

\( 2z\)

\(\frac{1}{\rho ^n}\)

\( \left( 0,\left[ \begin{array}{c} 0 \\ 1 \end{array}\right] \right) \)

\(\frac{1}{2}-\frac{1}{\rho ^n}\)

\( \{0\} \times \mathbb {R}^2\)

\( \left[ \begin{array}{c} 1 \\ 0 \end{array}\right] \)

\( \frac{2}{\rho ^n}+\frac{1}{2}\left( \Vert z\Vert ^2 + \Vert z-1\Vert ^2\right) \)

\( 2z-1\)

\(\frac{2}{\rho ^n}+\frac{1}{4}\)

\( \left( \frac{1}{2},\left[ \begin{array}{c} 1 \\ 1 \end{array}\right] \right) \)

0

\( \left\{ \frac{1}{2}\right\} \times \mathbb {R}^2\)

\( \left[ \begin{array}{c} 0 \\ 1 \end{array}\right] \)

\( \frac{1}{2}\left( \Vert z\Vert ^2 + \Vert z-1\Vert ^2\right) \)

\( 2z-1\)

\(\frac{1}{4}\)

\( \left( \frac{1}{2},\left[ \begin{array}{c} 0 \\ 0 \end{array}\right] \right) \)

\( \frac{1}{\rho ^n}\)

\({\left\{ \frac{1}{2}\right\} \times \mathbb {R}^2}\)

\( \left[ \begin{array}{c} 1 \\ 1 \end{array}\right] \)

\( \frac{1}{\rho ^n}+\Vert z-1\Vert ^2\)

\( 2z-2\)

\( \frac{1}{\rho ^n}\)

\( \left( 1,\left[ \begin{array}{c} 1 \\ 0 \end{array}\right] \right) \)

\( \frac{1}{2}-\frac{1}{\rho ^n}\)

\( {\left\{ 1\right\} \times \mathbb {R}^2}\)

Of note is the locally optimal solution \((\widetilde{z}^n,\widetilde{\varvec{w}}^n) = \left( \frac{1}{2}, [0,0]^T\right) \), which for \(0< \rho ^n< \infty \) is clearly a local minimum for \(\varphi ^n\) over \(|\overline{z}-z|< 1/\rho ^n\). Furthermore for \(\rho ^n< \infty \), \(\widehat{\partial }\varphi ^n(\widetilde{z}^n,\widetilde{\varvec{w}}^n) =\left\{ \left( 0,[0,0]^T \right) \right\} \) is non-empty. However, in the limit as \(\rho \rightarrow \infty \), we have clearly that \((\overline{z},\overline{\varvec{w}}) = \left( \frac{1}{2},[0,0]^T\right) \) realises the value \(\phi ^{\infty }(\overline{z},\overline{\varvec{w}}) = \frac{1}{4}\) over all cross-sections, but \(\widehat{\partial }_z\phi ^{\infty }\left( \overline{z},\overline{\varvec{w}} \mid \varvec{x}_{\mathcal {I}},\varvec{y}_{\mathcal {I}}\right) \not \supset \{0\}\) for two of the four cross sections, and furthermore their intersection \(\bigcap _{(\varvec{x}_{\mathcal {I}},\varvec{y}_{\mathcal {I}})} \widehat{\partial }_z\phi ^{\infty }\left( \overline{z},\overline{\varvec{w}} \mid \varvec{x}_{\mathcal {I}},\varvec{y}_{\mathcal {I}}\right) =\emptyset \) is empty and so \(\widehat{\partial }\phi ^{\infty }(\overline{z},\overline{\varvec{w}}) = \emptyset \). Also note that while this \((\overline{z},\overline{\varvec{w}})\) is a partial minimum for \(\phi ^{\infty }\), it is not (even) a local minimum over \((z,\varvec{w})\) jointly.

We have demonstrated through example (but never observed in our experiments) a pathological case where a partial minimum is encountered in the limit, but local optimality or feasibility for SMIP is not achieved. This lack of local optimality is due to a partial minimum being found for \(\phi ^{*}\) where Fréchet subdifferentiability fails, a problem foreshadowed in Lemma 21 (indeed the existence of a non-empty subdifferential ensures stationarity from which local optimality follows, Lemma 11). This failure of subdifferentiability occurs only when our solution is minimising some (not all) of active sections defining \(\phi ^{*}\), see Lemma 11. Furthermore, for any solution to problem (26) that satisfies consensus, this lack of Fréchet differentiability is ruled out, see Theorem 36 (2) and we are then assured of obtaining a persistent local minimum. A partial converse of this may be found in Proposition 12 where integer consensus is ensured for a persistent minimum. Such pathological limit points are unstable in the sense that they are mere partial minima but not even locally minimal jointly in \((z,\varvec{w})\) for \(\phi ^{*}\). Consequently, an apt minor perturbation of \((\overline{z},\overline{\varvec{w}})\) (suggested by Corollary 18) may be employed to get the iterative FPPH approach unstuck.

6 Computational results

6.1 Algorithm

Algorithm 3 presents a modified version of Algorithm 1, in which we explicitly consider the initialisation steps and the rules for updating Lagrange and penalty parameters between successive iterations. We use an algorithm formulation consistent with the typical presentation of Progressive Hedging but allowing for differences in how (and when) Lagrange and penalty parameters are updated. Also, since the second-stage discrepancies are always zero in the context of a two-stage SMIP, we omit the second component of the penalty function (the \(v\) component in Assumption 2).

Algorithm 3
figure c

FPPH algorithm for SMIP

The required properties of the penalty function specified in the ICRF Assumption 2 gives us flexibility in choosing \(\psi \). As described next, we compute weights for a weighted squared 2-norm form of the penalty function \(\psi \) during the initialisation with the aim of accelerating convergence to a reasonably high-quality feasible solution. Subsequently, we describe update schemes and the termination condition used in our computational experiments.

6.1.1 Initialisation

The penalty function weights in determining the weighted squared 2-norm penalty function \(\psi \) are denoted \(\bar{\mu }_i\), \(i=1,\dots ,n\), which do not change between iterations \(n\ge 0\). The iteration \(n\ge 0\) penalty coefficients \(\rho _s^n=\pi _s^n\rho ^n\) denote the weighting of the iteration \(n\) penalty magnitude \(\rho ^n\) by penalty weight \(\pi _s^n\). Initially, \(\rho _s^0 = \pi _s^0 = p_s\). Once the PenaltyUpdateCondition is satisfied, the \(\rho _s\) terms are increased by the PenaltyUpdate function to modify the penalty applied to each scenario. These functions are defined in Sects. 6.1.2 and 6.1.3. The initial \(z^0\) of Algorithm 3 Line 3 may be computed by \(z^{0}_i =\sum _{s \in S} p_s x^{0}_{s,i}\text { for all }i \in 1,\dots ,n.\) Having \(z^0\), the values \(\mu _i\) for \(i=1,\dots ,n\) which are required to form the penalty weights for initialising the penalty function \(\psi \) are computed as given in [11] for applying Progressive Hedging to SMIPs:

$$\begin{aligned} \mu _{i} = \frac{c_i}{\max \left\{ \sum _{s' \in S} p_{s'} \left|x^0_{{s'},i} - z^0_{i} \right|, 1\right\} } \quad \text {for each }i \in 1,\dots ,n \end{aligned}$$

when variable i is continuous, and

$$\begin{aligned} \mu _{i} = \frac{c_i}{\left( \max _{s' \in S} x^0_{{s'},i}\right) - \left( \min _{s' \in S} x^0_{{s'},i}\right) + 1} \quad \text {for each }i \in 1,\dots ,n \end{aligned}$$

when variable i is discrete.

We employ a slightly modified version of this scheme, where penalty parameters that would be set to zero by this rule are instead set to the smallest non-zero value \(\mu '\) among all other penalty parameters \(\mu _i\), \(i=1,\dots ,n\). We denote the modified penalty function weights as \(\bar{\mu }_i:= \max \{\mu _i,\mu '\}\) for each \(i=1,\dots ,n\). This modification did not materially affect the performance of Progressive Hedging and provides a guarantee that in the penalty-updating algorithms, all penalties can grow to be arbitrarily large, as required by Assumption 10.

This choice of penalty initialisation has been made to allow as direct a comparison as possible between Progressive Hedging (using a set of parameters established to be reasonable for that algorithm) and the penalty-updating variations of Algorithm 3. Having computed \(\bar{\mu }_i\) for all \(i=1,\dots ,n\), we set the penalty function for our computational experiments as \(\psi (u) = \frac{1}{2}\sum _{i=1}^n \bar{\mu }_iu_i^2\) and initialise \(\rho _s^0 = \pi _s^0 = p_s\) for all \(s \in S\). (Note that \(\rho _s^n:= \pi _s^n\rho ^n\) for \(n\ge 0\).) With the definition of \(\psi \), the z update step on Line 14 of Algorithm 3 can be written in the form

$$\begin{aligned} z^{n}_i \leftarrow \sum _{s \in S} \pi _s^{n-1} x^{n}_{s,i} \quad \text {for all }i \in 1,\dots ,n, \end{aligned}$$

where \(\pi _s^{n} = \frac{\rho _s^n}{\sum _{s' \in S} \rho _{s'}^n}\) for each \(s \in S\), \(n\ge 0\). (Note that \(\bar{\mu }_i\) does not influence this update step.) Furthermore, the dual multiplier update based on this definition of \(\psi \) is given at each iteration \(n\ge 0\) as

$$\begin{aligned} \lambda _{s,i}^{n} \leftarrow \lambda _{s,i}^{n-1} - \rho _s^{n-1}\bar{\mu }_i(z_i^n-x_{s,i}^n) \quad \text {for all}\; s \in S\;\text {and}\;i=1,\dots ,n. \end{aligned}$$

One may verify that this dual multiplier update maintains the feasibility condition \(\sum _{s \in S} \lambda _s^n= 0\) for each \(n\ge 0\).

6.1.2 Penalty update condition

We consider three update-type conditions:

  1. 1.

    PenaltyUpdateCondition always returns False, meaning that the algorithm performs dual updates at every iteration and never increases the penalty parameters. This is equivalent to the Progressive Hedging algorithm for SMIP. This update condition does not satisfy Assumption 10, since the penalty parameters do not become arbitrarily large.

  2. 2.

    PenaltyUpdateCondition always returns True, meaning that the algorithm does not update the dual variables again after the initialisation step and instead increases the penalty parameters. This is designated as the Penalty Only variant of FPPH.

  3. 3.

    Track the degree of change in the dual variables

    $$\begin{aligned} \Delta _k = \sum _{s \in S} \sum _{i=1}^{n} |\lambda _{s,i}^{k-1} - \lambda _{s,i}^k|\end{aligned}$$

    at each iteration k. If in the current iteration \(k'\) the condition

    $$\begin{aligned} \Delta _{k'} < \beta \frac{\Delta _1 + \max _k \Delta _{k}}{2} - \gamma \end{aligned}$$
    (27)

    is satisfied, PenaltyUpdateCondition returns True so that no further dual updates are performed; otherwise it returns False. We set the parameters \(\beta \) and \(\gamma \) to 0.5 and \(10^{-3}\) respectively. This is designated as the Dual Step Length variant of FPPH.

As a simple guarantee that the Dual Step Length method satisfies Assumption 10 we could specify a specific number of iterations after which PenaltyUpdateCondition must return True. However, in our computational tests with this update condition either (27) or integer-variable consensus was always satisfied after a reasonably small number of iterations.

6.1.3 Penalty update scheme

We gradually increase the penalty parameter for the scenario whose first-stage variables are furthest from consensus with the following method. For each scenario \(s\in S\), we calculate its distance from consensus \(D_s^n= \left\| z^n- x_s^n \right\| _2\). Then, update the penalty multipliers as follows:

$$\begin{aligned} \rho _{s}^{n} \leftarrow \left( 1 + \alpha |S |\frac{D_s^n}{\sum _{s \in S} D_s} \right) \rho _{s}^{n-1} \quad \text {for all }s\in S. \end{aligned}$$

We set the parameter \(\alpha \) to 0.1. This rule is intended to prioritise increasing the penalty parameters corresponding to the scenarios whose first-stage variables are furthest from consensus. Assuming that PenaltyUpdateCondition returns True after a finite number of iterations, this update scheme satisfies Assumption 10.

6.1.4 Termination condition and \(n_{max}\)

Termination of each computational test is conditioned on attaining consensus \(z_{\mathcal {I}}^n= x_{s,{\mathcal {I}}}\), for all \(s \in S\) in all integer variables. For the instances with pure integer first-stage variables, this condition is the same as requiring first-stage consensus. For the instances with mixed integer first-stage variables, we generally do not have full consensus in the continuous variables at this point. To obtain feasible solutions, we take each unique first-stage solution \(x_s\) and find the corresponding optimal second-stage decisions y. We then report the best solution value found among these candidate solutions.

Our motivation for applying this convergence criterion to mixed integer first-stage instances is that when allowed to run beyond achieving integer consensus, the FPPH variants typically satisfied the convergence criterion \(\sqrt{\sum _{s \in S} p_s \left\| x^n_s - z^n \right\| ^2_2} < 10^{-3}\) within 100 iterations but with very poor solution quality, whereas PH failed to satisfy this criterion given even 200 iterations. Any potential method for finding a high-quality solution quickly given a fixed value for the first-stage integer variables could be applied to the solutions produced by both PH and FPPH, but implementing and tuning such a method is outside the scope of this paper.

We set \(n_{max} = 100\) since both FPPH variants generally converge well within this iteration limit, and in cases where PH does not it is already clearly slower than FPPH in terms of both runtime and iteration count.

6.2 Computation environment

The experiments in this section were conducted with a C++ implementation of Algorithm 3 using CPLEX 22.1 [28] as the solver. For reading SMPS files into scenario-specific subproblems and for their interface with CPLEX, we used modified versions of the COIN-OR [29] Smi and Osi libraries to instantiate appropriate C++ class instances of the subproblems directly.

The computing environment is the Gadi cluster maintained by Australia’s National Computing Infrastructure (NCI) and supported by the Australian government [30]. To maintain a comparable environment, experiments were performed on a single CPU using one thread per CPLEX solve for both algorithms.

The PH and FPPH algorithms are deterministic in terms of the solutions produced, but the time required for CPLEX to solve the subproblems at each iteration has some variation. Therefore, for each test, we ran each algorithm three times on each instance and report the average runtime.

6.3 Computational experiments: Pure integer first-stage instances

We first consider the application of FPPH and our implementation of Progressive Hedging to the CAP instance set [31] using the first 250 scenarios for each instance, and the SSLP instance set [32]. To evaluate algorithm performance we compare it to the known IP optimal solution. To obtain the integer feasible optimal solutions for the CAP instances we used CPLEX to directly solve the MIP reformulation of each instance. The integer feasible optimal solutions for the SSLP instances are provided by SIPLIB [9].

The computational results are summarised in Fig. 1. These figures compare both the wall-clock time required for convergence (compared to the slowest algorithm to achieve convergence) and the quality of the feasible solutions obtained at termination. A more detailed summary of our results, including absolute runtime and solution values, is provided in the supplementary material (Table B1).

When applied to the SSLP instances, all three algorithms typically find the same solution, and it is often optimal. The Dual Step Length variant of FPPH outperforms PH in terms of runtime for all instances except for SSLP-15-45-5, where they require an equal (and small) amount of time. The Penalty Only variant of FPPH often outperforms both the Dual Step Length variant and PH, but fails to find the optimal solution of SSLP-15-45-10 and is a little slower when applied to SSLP-5-25-50 and SSLP-15-45-5. PH fails to converge to a feasible solution within 100 iterations when applied to SSLP-15-45-15.

When applied to the CAP instances, PH fails to converge to a feasible solution within 100 iterations for four of the eight instances and is again consistently outperformed in terms of runtime and matched in solution quality by the Dual Step Length variant of FPPH even when it does converge. There is not a clear favourite between the Penalty Only and Dual Step Length variants when applied to the CAP instances; each variant finds a higher-quality solution than the other variant for at least one instance, and converges faster than the other variant for several instances.

6.4 Computation experiments: Mixed-integer first-stage instances

We also compared the performance of FPPH and our implementation of Progressive Hedging applied to the DCAP instance set [33, 34]. In this case, we compare with the known upper bounds given by SIPLIB [9]. These results are summarised in Fig. 2, with further detail in the supplementary material (Table B2).

For these instances, PH consistently obtains consensus in the integer variables within 100 iterations and generally outperforms the Dual Step Length variant of FPPH in terms of runtime and solution quality. The Penalty Only variant of FPPH obtains better solution quality than PH when applied to DCAP342 (with 200, 300 and 500 scenarios) but finds considerably worse solutions when applied to the other DCAP instances.

Fig. 1
figure 1

Comparison between our implementation of Progressive Hedging and variants of FPPH, applied to instances with pure integer first stage (SSLP and CAP). Bar height indicates the time required for convergence compared to the slowest converging algorithm. Solid bars indicate the best quality solution found among the three algorithms. Tinted bars indicate convergence to a lower-quality solution. Suboptimal solutions are indicated by a percentage optimality gap. A solid bar with no percentage gap indicates the optimal solution was found. Empty bars indicate non-convergence within 100 iterations; the arrow signifies that these 100 iterations took much longer than the slowest converging algorithm

Fig. 2
figure 2

Comparison between our implementation of Progressive Hedging and variants of FPPH, applied to instances with mixed integer first stage (DCAP). Bar height indicates the time required for integer variable consensus compared to the slowest converging algorithm. Solid bars indicate the best quality solution found among the three algorithms. Tinted bars indicate lower-quality solutions. The percentage gap between found solution and the known upper bound is given in each case

7 Conclusions

We have shown that the tools and techniques of variational analysis are well-suited for the analysis of the progressive hedging algorithm as applied to SMIP. Indeed the analysis interfaces well with the “just MIP it” approach to the development heuristics in this field. It allows for a new study of augmented Lagrangians and Gauss–Seidel methods, specifically recognising where the presence of smoothness is essential for the success of algorithmic approaches. The theory is able to shed light on how critical parameters need to be updated to ensure convergence.

Our computational results demonstrate that the FPPH algorithm, which is motivated by the above theory, has the potential to outperform PH in terms of quickly and reliably converging to high-quality feasible solutions for SMIP instances, particularly those with pure integer first-stage variables. By contrast, PH tended to outperform FPPH when applied to the DCAP instances which have mixed-integer first-stage variables, though the variant of FPPH performing no dual variable updates found higher-quality solutions than PH for the DCAP342 subclass. Further testing on a wider variety of instance classes is needed for a deeper understanding of how the structure of SMIP instances influences the relative performance of PH and FPPH, and how to set the penalty update rules of FPPH for the best performance on a given class of instances.