A study of progressive hedging for stochastic integer programming

Motivated by recent literature demonstrating the surprising effectiveness of the heuristic application of progressive hedging (PH) to stochastic mixed-integer programming (SMIP) problems, we provide theoretical support for the inclusion of integer variables, bridging the gap between theory and practice. We provide greater insight into the following observed phenomena of PH as applied to SMIP where optimal or at least feasible convergence is observed. We provide an analysis of a modified PH algorithm from a different viewpoint, drawing on the interleaving of (split) proximal-point methods (including PH), Gauss–Seidel methods, and the utilisation of variational analysis tools. Through this analysis, we show that under mild conditions, convergence to a feasible solution should be expected. In terms of convergence analysis, we provide two main contributions. First, we contribute insight into the convergence of proximal-point-like methods in the presence of integer variables via the introduction of the notion of persistent local minima. Secondly, we contribute an enhanced Gauss–Seidel convergence analysis that accommodates the variation of the objective function under mild assumptions. We provide a practical implementation of a modified PH and demonstrate its convergent behaviour with computational experiments in line with the provided analysis.


Introduction
Stochastic mixed integer programming (SMIP) models are, in essence, large-scale mixed-integer programming (MIP) models in which the uncertain nature of the input parameters is modelled by means of a finite set of discrete scenarios [1].This general framework allows one to model a broad class of decision problems, as can be attested from the wealth of publications from diverse areas of science and engineering.Important applications employing SMIP models include unit commitment [2], hydro-thermal generation scheduling [3], military operations [4], vaccination planning [5], air traffic flow management [6], forestry management and forest fire response [7], supply chain and logistics planning [8], and other applications referred to on the SIPLIB website [9].The practical and theoretical development of stochastic programs (SP) (without integer variables) preceded SMIP and has influenced its development.The Progressive Hedging (PH) algorithm [10] for solving SP problems is well studied and theoretically supported for convex problems with no integer-constrained variables.Even without this theoretical support in the setting with integer-constrained variables, PH as a heuristic often demonstrates effectiveness for providing both upper and lower bounds [11] and often feasible solutions.Motivated by the limited theoretical support of PH for its application to SMIP and the observed success of PH heuristics for SMIP, our objective is to develop a theoretical framework and demonstrate convergence in numerical experiments.
The large scale of the deterministic equivalent of SMIP models proves to be challenging for off-the-shelf solvers that do not utilise the decomposable structure inherent in the extensive deterministic forms of SMIP models.By contrast, more promising solution methods utilise the SMIP's decomposable structure.The PH algorithm [10] addresses the decomposable structure as a variant of the alternating direction method of multipliers (ADMM) [12] where the non-anticipativity constraint is relaxed into an augmented Lagrangian (AL) reformulation.One of the earliest detailed treatments of its convergence was based on variational analysis techniques [10], where nonsmooth analysis also provided important tools for the study of convergence with respect to the satisfaction of optimality conditions.Augmented Lagrangian duality, which plays a fundamental role in this work, also appeared in subsequent works to provide duality theorems for very general nonconvex problems, including cases encompassing integer constraints [13,Chapter 11, section K], [14].These publications have attracted the attention of the integer programming community and resulted in a body of literature focussing specifically on the application of augmented Lagrangian duality to mixed-integer programming (MIP) [15,16].This in turn motivated researchers to state, analyse, and test a version of the PH method containing nonsmooth augmentations [17].Concurrently, researchers have also explored the combination of PH with the Frank-Wolfe algorithm [18] to obtain provably convergent dual bounding methods for SMIP based on the Lagrangian relaxation of the non-anticipativity constraint [19].Other researchers produced primal (heuristic) methods where the quadratic subproblems of PH were replaced by mixed integer quadratic programs (MIQP) [11].These approaches were shown to produce excellent solutions as long as the penalty parameter was chosen judiciously, and so have remained an enigma, lacking any theoretical convergence result.In this paper, we show that variational analysis techniques will further draw back the curtain on this enigma and so explain what actually underpins the success of PH with a MIQP when applied to SMIP.
Under reasonable assumptions, we analyse the convergence of PH applied to SMIPs, where we allow the penalty parameters to vary in a less restricted fashion than is typically required for PH and related approaches, while also applying Lagrangian multiplier updates requiring a special rule due to the required satisfaction of convergence/boundedness criteria that may not be automatically satisfied when PH is applied to SMIPs.Furthermore, our approach allows for more generality in the type of augmented Lagrangian terms.In our analysis, we view the PH method as an application of Gauss-Seidel iterations with penalty and Lagrange terms allowed to vary between iterations.In this setting, we contribute insight into when PH generates a sequence of solutions that converges to a feasible point of the SMIP.Our approach may also be viewed as interfacing some seemingly distinct solution methodologies found in proximal point methods such as PH, Gauss-Seidel (GS) methods, and (mixed-integer) augmented Lagrangian duality [15][16][17]20].Furthermore, connection with feasibility pump (FP) primal heuristics is evident in the same spirit as contributed in [21].
Some of the conditions assumed for the penalty and/or augmented Lagrangian term that are required to achieve an exact penalty effect in [15,16] (e.g., [16,Theorem 5]) require the penalty functions to be non-differentiable, which can impede the analysis of Gauss-Seidel methods [22].Thus, in this paper, we set out to develop this theory from another direction that allows for a differentiable penalty term, in line with that typically used for analysing progressive hedging-like methods.To compensate for the loss of the exact penalty effect shown in [15,16], we provide an analysis describing the effect of (potentially) letting the penalty coefficient go to infinity in order to achieve feasibility.In particular, we analyse a SMIP solution method inspired by the FP, PH, and Gauss-Seidel convergence analysis, that for short, will be denoted FPPH, which in practice is similar to the use of PH as an heuristic [11], except we allow for greater generality in the updating of Lagrange multipliers, the changing of penalty coefficients, and in the allowable forms of the augmented Lagrangian penalty function itself.Successful convergence of the method allows for (but is not predicated on) the unbounded increase of penalty parameters.To be clear, our analysis does not promise both primal and dual optimal convergence as is provided for PH in the convex, continuous setting.Rather, we address convergence goals similar to those of feasibility pump methods, where high-quality feasible solutions are sought, and the main challenge is avoiding either non-feasible convergence or cycling.
Our experimental results will demonstrate the effectiveness of FPPH.As with all FP approaches, one needs to develop heuristics for updating the penalty parameters to encourage the methods to locate the best possible feasible solution and hence the strongest primal bound.As a general conclusion, the FPPH presents promising performance relative to Progressive Hedging in terms of quickly obtaining good feasible solutions for SMIPs with pure integer first-stage variables.
This paper is structured as follows.In Sect.2, we set up the assumptions on the regularisation and the conceptual framework on which the analysis rests.In Sect.3, further results are developed on how we may decompose the regularisation into its "cross-sections" where integer variables are fixed, which provides a foundation for insight into the local minima of the (whole) regularisation.Section 4 introduces the concept of persistent local minima and their relationship to feasibility for SMIP (1).The convergence analysis of the associated Gauss-Seidel algorithm is carried out in Sect. 5.In Sect.6, we present computational results illustrating the employment of variants of FPPH to find high-quality feasible solutions to SMIP instances.In Sect.7, we provide concluding remarks and directions for future developments.

Fundamental concepts and conceptual algorithmic framework
Denote x = (x s ) s∈S where x s ∈ X d := R n−q ×Z q ⊆ X := R n .Similarly y = (y s ) s∈S where y s ∈ Y d := R m−r × Z r ⊆ Y := R m .We state the SMIP in the following split-variable deterministic formulation (see, e.g., [1]) Note that the constraints x s ∈ X that hold only for the first-stage decision variables x s are identical for all s ∈ S.
We denote the extended real values by R +∞ := R ∪ {+∞}.For each scenario s ∈ S copy of first-stage variables x s and separately for each scenario s ∈ S secondstage variables y s we assume that the integer variable component indices (I) always follow the real variable component indices (R).That is, x s := (x s,R , x s,I ) and y s := (y s,R , y s,I ) for each s ∈ S. Define the projection proj X,I : X d → Z q by proj X,I ((x R , x I )) = x I (with a similar definition for y I projection proj Y,I ).As the first-stage consensus variable z components should match those for each x s , due to the non-anticipativity constraints z − x s = 0, s ∈ S, the same first-stage distinction between real and integer components z := (z R , z I ) apply.Corresponding distinctions for the second-stage consensus w s := (w s,R , w s,I ), s ∈ S, apply as well.Note that z is not explicitly constrained to lie within the discrete feasible set X .Nor are w s , s ∈ S, explicitly constrained to lie withing Y s (x s ) or Y s (z).Thus, strictly speaking, z ∈ X and w s ∈ Y vary freely within their respective spaces.Denote w := (w 1 , . . ., w |S| ) and similarly for (x, y) and when needed we denote z := (z, . . ., z) ∈ R n×|S| .
Since the second-stage non-anticipativity variables are independent for each outcome scenario and otherwise unconstrained, the non-anticipativity constraints w s − y s = 0, x ∈ S, have no practical effect on the feasibility of the second-stage decisions y s in SMIP (1).Nevertheless, this formulation aids the subsequent analysis by allowing the incorporation of all variables (regardless of stage) into regularisation terms in a symmetric fashion.The second stage feasibility is propagated from y to w via this constraint, while w remains unconstrained.The formulation (1) is also conducive to generalising our results for two-stage SMIPs to multi-stage problems in which all stages except the last have active non-anticipativity constraints.In the practical application of developed algorithms to two-stage problems, the use of w may be suppressed, as it is in the description of the computational experiments of Sect.6.
Throughout our developments, we assume the following assumptions to hold regarding our SMIP (1).We explicitly assume the existence of an optimal solution, which could be replaced by the standard assumption of rationality of the data defining the problem.

Assumption 1
We make the following standard SMIP assumptions: 1. Stochasticity of p s : for each s ∈ S, we have p s > 0 and s∈S p s = 1. 2. Non-emptiness: K s , s ∈ S, is a non-empty set of feasible decisions constructed with linear constraints and integrality constraints on the x s and y s variables.(This also implies that K s is closed.)3. Boundedness and Optimality: The optimal value of the SMIP (1) is bounded from below.Also, the feasible sets K s , s ∈ S, are bounded.Furthermore, ζ SM I P is feasible and possesses an optimal solution.4. Relatively complete recourse: The SMIP model has relatively complete recourse; ∀x ∈ X , ∀s ∈ S, we have Y s (x) = ∅: that is, first-stage decisions x that satisfy the first-stage specific constraints x ∈ X have at least one second-stage decision solution (y s ) s∈S for which (x, y s ) ∈ K s for all scenarios s ∈ S.

Of interest is the dual function
where with, for each s ∈ S, We assume that K s is defined as in (1d), and that the usual dual feasibility λ ∈ := {λ | s∈S λ s = 0} holds.For each scenario s ∈ S, the penalty function ψ output value is scaled by a penalty scaling parameter ρ > 0, and scenario s ∈ S specific penalty weighting parameters π s > 0 (for which s∈S π s = 1) to be specified.Note that under the assumption that λ ∈ , the summation s∈S λ s z conveniently vanishes and so these terms may be dropped in subsequent developments.Each instance of problem ( 2) is a continuous optimisation problem over the space X × Y |S| , and for nontrivial instances of SMIP (1), ϕ λ,ρ,π is nonconvex with multiple isolated local minima.Under assumptions in [15,16], we have ζ SM I P = ζ(λ, π, ρ) for sufficiently large but finite ρ.Properties of locally optimal solutions to the minimisation of ϕ λ,ρ,π , and how these local minimisers relate to the solutions to the original SMIP (1), are of special interest in this paper's subsequent analysis.
As mentioned earlier, the nonsmoothness of penalty functions ψ that support the exact penalty properties discussed in [15,16] prevents the support of convergence theory provided by Gauss-Seidel approaches.For this reason, we modify the properties assumed in [15,16] for the penalty function ψ to the conditions stated in Assumption 2. In particular, we assume that the penalty is strongly convex and differentiable from the outset (departing markedly from [15,16]), as this is required for a Gauss-Seidel approach to be applied with desirable convergence properties (see Lemma 21).
Assumption 2 For our smooth penalty function ψ : X × Y → R, we make the following integer compatible regularisation function (ICRF) assumptions: We note that Assumption 2 implies (0, 0) = ∇ψ(0, 0) and thus (4) implies Remark 1 In the theoretical development, we partition the discrepancies into u and v components to correspond to the special treatment of early-stage variables against latestage variables.For a two-stage problem, u corresponds to first-stage discrepancies, and v corresponds to second-stage discrepancies.To allow for versatility in how the theoretical development informs algorithmic approaches, especially for application to multi-stage problems, we carry the development with the distinction between u and v discrepancies through Sect. 5.

Remark 2
In our computational developments in Sect.6, we use a weighted squared 2-norm penalty function ψ(u, v) = 1 2 n i=1 μi u 2 i + v 2 with weights μi > 0, i = 1, . . ., n.In general, the strong convexity with modulus m is equivalent to the convexity of the function For the algorithmic manifestation as presented in Sect.6, v may furthermore be set identically to value zero.

Preliminary application of Gauss-Seidel iterations
We define the following notation based on the assumption that the Lagrange multipliers λ n ∈ and the penalty parameters ρ n > 0, π n > 0, s∈S π n s = 1, vary with each iteration n ≥ 0: One iterative solution approach for finding locally optimal solutions for SMIP (1) starting with initial z 0 ∈ X is based on Gauss-Seidel (GS) iterations n ≥ 0 of the form The z update (7b) is not easily computable, but the w update (7a) is so, as demonstrated in the following proposition.) such that which would contradict the optimality in (8).
Computing the update z n+1 ∈ arg min z ϕ n (z, w n+1 ) given fixed w n+1 corresponds to an infimal convolution of (x s , y s ) → f s (x s , y s ) + δ K s (x s , y s ) + (λ n s ) x s and (u, v) → ρ n π n s ψ(u, v), for each s ∈ S, where we denote the indicator function of a set K s by δ K s (x, y) that takes the value zero if (x, y) ∈ K s and +∞ otherwise.The infimal convolution is well-studied [13, Chapter 1, section H], and later we make use of certain convex "cross-sections" of this infimal convolution.However, the calculation culminating in z n+1 ∈ arg min z ϕ n (z, w n+1 ) is still not easily computable, as it requires the solution of a MIP of comparable difficulty to the original SMIP (1).Nevertheless, this problem z n+1 ∈ arg min z ϕ n (z, w n+1 ) is useful from a theoretical standpoint, as it links the consensus problem to the Gauss-Seidel step of the continuous regularisation.
A more practical approach to the z update takes the form of descent steps using the usual consensus update, i.e.
where w n+1 − y n+1 = 0 follows from Proposition 3. From Assumption 2(3) with s and v 0 s = v s = 0 for all s ∈ S, and the optimality condition associated with the and so the z n+1 update (9) must be unique.
Using this observation, we can devise a Gauss-Seidel algorithm that is guaranteed to produce non-ascent steps while the stabilisation (z n , w n+1 ) = (z n+1 , w n+1 ) is not achieved, which is given in Algorithm 1. Algorithm 1 describes a two-block Gauss-Seidel iterative approach on the two blocks (x s , y s , w s ) s∈S and z, where the mixed-integer constraints only appear in the block (x s , y s , w s ) s∈S subproblem implicitly referred to in Lines 4-5 of Algorithm 1.In the following sections, we analyse the convergence properties of certain embedded subsequences of (mid-)iterations (x n k +1 , y n k +1 , w n k +1 , z n k ) generated by Algorithm 1 for penalty coefficient values ρ n > 0, penalty weights π n s , s ∈ S, and Lagrangian multipliers λ n s , s ∈ S, that vary with iteration n ≥ 1. (It is convenient to maintain that s∈S λ n s = 0 for all s ∈ S and all n ≥ 0.) We must also assume the solution (x n k +1 , y n k +1 , w n k +1 ) is globally optimal in order to carry out our convergence analysis in Sect.5.1.Here we assume the existence of Fréchet subdifferentials at the minimising points, and this is assured for any global minima.Furthermore, when ψ is a quadratic form, a global minimum may be computed in practice using a MIQP solver.

Remark 3
The classical progressive hedging algorithm is realised by taking π n s = p s and so ρ n = s∈S p s ρ n (as s∈S p n s = 1).Then for ψ(•, 0) = Next, we build on that development where Algorithm 1 is viewed as an approximate two-block GS iterative approach within the continuous optimisation framework of successively minimising ϕ n in z (approximately) and w (globally and exactly) with Lagrange multipliers λ n ∈ , penalty coefficient values ρ n > 0 and penalty weights π n s , s ∈ S, varying between iterations n ≥ 0 under certain assumptions.We conclude this section by noting that the above algorithm is essentially that of [11], with alterations to the Lagrangian multiplier and penalty parameter updates.In particular, we consider what happens when Lagrange multipliers λ n ∈ and penalty weights π n stop changing after a finite number of iterations, while penalty parameters {ρ n } may increase without bound.The latter feature requires us to consider the limiting behaviour of the regularisation ϕ n as ρ n → ∞.Such an analysis is facilitated by analysing the level curves of the sequence of functions, denoted by lev c ϕ λ,ρ,π := , prompting the use of epi-convergence as a tool in our analysis as this is associated with the convergence of level sets.

Properties of the SMIP regularisation ' , ,
The continuous regularisation ϕ λ,ρ,π of SMIP (1) has properties that allow for feasible points of SMIP (1) to be associated with certain local minima of ϕ λ,ρ,π .To gain insight into these properties of ϕ λ,ρ,π , we first note some additional properties of ψ that follow from the properties listed in Assumption 2.

The function (z, w
with modulus π s ρ L s , where L s depends on the diameter of conv(K s )+ B (0, 0).Taking L := max{L s }, we also have ϕ λ,ρ,π is Lipschitz continuous with modulus ρ L .
Note that this corresponds to a polyhedral subset of K s once we have removed the integrality constraint by fixing the integer variables at a specific integer value.We now consider the behaviour of ϕ λ,ρ,π within neighbourhoods having progressively additional structure imposed.In preparation thereof, we introduce notation that facilitates the view of ϕ λ,ρ,π in terms of its finitely numbered specific cross-sections over Definition 1 induces the following notation for proximal cross-sections for each , the proximal cross-sectional values are defined by and the the set of arguments realising the proximal cross-sectional values are defined by For each s ∈ S and (x s,I , y s,I ) ∈ proj I (K s ), the properties of Assumption 2 for ψ allow for the following properties of the cross-sections to be established.
Proof This function can be represented as the infimal convolution of two closed, convex functions , which in turn ensures that the infimal convolution is bounded away from −∞.As the strict epi-graph of an infimal convolution equals the sum of strict epi-graphs of the constituent functions, convexity follows [13, Exercise 1.28].
Note that ϕ λ,ρ,π s (z, w s ) = min (x s,I ,y s,I )∈proj I (K s ) ϕ λ,ρ,π s z, w s | x s,I , y s,I , for each s ∈ S, is a minimum of a finite number of convex functions, but ϕ λ,ρ,π itself is not guaranteed to be convex or differentiable on its entire domain X × Y |S| .Nevertheless, ϕ λ,ρ,π s is locally convex and differentiable on open neighbourhoods N where, for all (z, w Lemma 6 Assume ψ satisfies the Assumption 2, with parameter m as in Assumption 2 (3).For each fixed D > 0, there exists a δ > 0 such that if a discrepancy 2 and any fixed D > 0, we may identify . This observation will have practical value in terms of separating values for different cross-sections of ϕ.
Proposition 7 Assume ψ satisfies Assumption 2 and (z 0 , w 0 s ) ∈ K s for all s ∈ S. If there is at least one scenario s ∈ S such that (x s,I , y s,I ) ∈ proj I (K s ) with (z 0 I , w 0 s,I ) = (x s,I , y s,I ), then there exists a finite threshold penalty coefficient ρ > 0 and a threshold δ > 0 such that for all ρ > ρ and 0 < δ < δ, the strict inequality Proof Assuming for some s ∈ S that we have for all (x s , y s ) ∈ K given that (z 0 I , w 0 s,I ) = (x s,I , y s,I ).Due to the compactness of K when ρ > ρ > 0 sufficiently large.Thus, for (z − z 0 , w s − w 0 s ) < δ and ρ > ρ > 0, we have for all ρ > ρ and (z, We note that for each (z, w s ) ∈ B δ (z 0 , w 0 ), with s ∈ S , we have s (z, w s ) > 0 and so the gap between the left and right hand sides of the inequality in ( 16) can only grow with increasing ρ.Recall that we seek elements of the set of feasible (nonanticipative) solutions is given by F := {(z, w) | (z, w s ) ∈ K s for all s ∈ S}, which is distinct from the set K .The next result follows immediately.

The theory of persistent local minima
In this section, we consider sequences {( x n , y n , w n , z n )} ∞ n=0 where we have ( x n , y n ) ∈ n ( z n , w n ).Furthermore, we assume lim n→∞ ρ n = ∞ and we single out a specific class of sequences of local minima {( z n , w n )} ∞ n=0 for ϕ n , which we call persistent, and which we show to be closely related to the feasible points of the underlying SMIP (1).We assume the following.
We also consider the following assumption on penalty weights separately.
Assumption 10 Penalty Weight Assumptions (PWA): We assume that s∈S π n s = 1 and π n s > 0, s ∈ S. We assume in addition that the applied update rule for generating penalty weights over iterations n ≥ 0 ensures that we have π n s ≥ ξ , for some fixed ξ > 0, for all but a finite number of iterations n ≥ 0, and for each s ∈ S such that x n s,I = z n I holds infinitely often in n.Furthermore, for n ≥ 0 for which the set S n := {s ∈ S | x n s,I = z n I } is empty, we assume the penalty weight update rule also ensures that π n+1 s = π n s for all s ∈ S.
If S n is empty for all but a finite number of iterations n, then consensus x n s,I = z n I has been reached and the above assumption is trivially satisfied.When S n = ∅ occurs, then z n ∈ X and by relative complete recourse there exists y s ∈ Y s ( z n ) for all s ∈ S so that ( z n , y) is feasible for SMIP (1).
In the context of Assumptions 9 and 10, we examine when it holds that z ∈ X .Under Assumptions 9 and 10, local minimisers z n of z → inf w ϕ n (z, w) can be peculiar in the sense that inf w ϕ n ( z n , w) can increase without bound as ρ n → ∞, while the maximal neighbourhoods of the local minimiser z verifying local optimality of z n for z → inf w ϕ n (z, w) vanish in measure as n → ∞.The local minimisers z n that do not suffer from these issues are those that we wish to isolate, in that inf w ϕ n ( z n , w) remains bounded at the local minimum z n despite ρ n → ∞.
w), we say that (z, w) is a persistent limit.

Remark 4
Clearly when we have a convergent sequence of local minima z n → z for z → inf w ϕ n (z, w), n ≥ 0, then for any w n with inf w ϕ n ( z n , w) = ϕ n ( z n , w n ), n ≥ 0, we have any convergent subsequence {( z n k , w n k )} ∞ k=0 converging to a persistent limit (z, w).
The subdifferential analysis of ϕ λ,ρ,π requires addressing its nonconvexity and non-differentiability.A notion of differentiation suitable for this purpose is Fréchet subdifferentiability as defined in [13].
The following motivates our set of assumptions on the penalty parameter update.
Proposition 12 Assume that SMIP (1) satisfies Assumption 1.Let ψ satisfy Assumption 2. Suppose we have a persistent local minima sequence ( z n , w n ) → (z, w) for ρ n → ∞ (and hence Fréchet stationarity 0 ∈ ∂ϕ n ( z n , w n ) for each n).If the PWA Assumption 10 holds, then for n sufficiently large and for all ( x n , y n ) ∈ n ( z n , w n ), we have z n I = x n s,I for all s ∈ S i.e. consensus holds in the integral components at a fixed value z n I = z I , and furthermore, the Fréchet stationarity Proof In general, the Fréchet stationarity 0 ∈ ∂ϕ n ( z n , w n ), n ≥ 0, is a much stronger notion in that it allows us to deduce the Fréchet stationarity 0 ∈ ∂ f n ( x n , y n , w n , z n ) via standard subderivative inclusions for marginal mappings (see [13,Theorem 10.13], Lemma 11 and elsewhere).Hence the Fréchet stationarity 0 ∈ ∂ϕ n ( z n , w n ) implies the cross-sectional Fréchet stationarity 0 ∈ ∂ϕ n z n , w n | x I , y I for all optimal crosssections (x I , y I ) ∈ proj I ( n ( z n , w n )).Identifying in terms of Lemma 11 (Part 2) ϕ with ϕ n , the ϕ i , i ∈ I , with the finite number of cross-sections ϕ n •, • | x I , y I with (x I , y I ) ∈ proj I (K ), and (z, w) with ( z n , w n ), we have a local minimum of ϕ n at ( z n , w n ) and by the definition of ϕ n we have the Fréchet stationarity 0 ∈ ∂ f n ( x n , y n , w n , z n ).From this, we show that all such solutions have a common set of integral values for n sufficiently large.As {( z n , w n )} ∞ n=0 is persistent there exists κ > 0 such that ϕ n ( z n , w n ) ≤ κ for all n sufficiently large.Hence for each optimal cross-section ( x n I , y n I ) ∈ proj I ( n ( z n , w n )) we have, using (5), The left-hand side of (18) tends to zero as ρ n → ∞ and π n s ≥ ξ for all s ∈ S n .After choosing a small 0 < δ < 1 2|S| we conclude that x n s,I − z n I < 1 2 for all s ∈ S n and so x n s ,I = x n s,I = z n I for all s, s ∈ S n .As z n I = x n s,I for all s / ∈ S n already, we have equality for all s ∈ S and as x n s,I = x s,I = z I are fixed independent of ρ n for n sufficiently large.
Feasibility may also be shown to hold, as stated in Lemma 13.Furthermore, in Proposition 14 we state the relationships between persistency and feasibility.

Lemma 13 Let the problem SMIP (1) satisfy the SMIP Assumption 1, and let penalty function ψ satisfy Assumption 2. If a sequence
with integer index n ≥ 0 satisfies the Assumption 9, then y n s = w n s ∈ Y s ( x n s ), s ∈ S, and z n ∈ arg min z s∈S π n s ψ(z − x n s , 0).Furthermore, for each n ≥ 0 for which z n ∈ X holds, we have ϕ n ( z n , w n ) = inf w ϕ n ( z n , w) bounded from above independent of the specific value of ρ n .Proof It follows from Lemma 4 that for all n ≥ 0, we have the existence of ( x n , y n ) ∈ K that attains the infimum in the definition of ϕ n ( z n , w n ).Because w n s , s ∈ S, is a global optimum for w s → ϕ n s ( z n , w), the claim y n s = w n s , s ∈ S follows readily from Proposition 3.
To establish that z n ∈ arg min z s∈S π n s ψ(z − x n s , 0), assume that z n / ∈ arg min z s∈S π n s ψ(z − x n s , 0).For any z n ∈ arg min z s∈S π n s ψ(z − x n s , 0), using the convexity of ψ, and η ∈ (0, 1) we have which would contradict the local optimality of z n for z → inf w ϕ n (z, w).Thus, z n ∈ arg min z s∈S π n s ψ(z − x n s , 0) for all n ≥ 0. Furthermore, it also follows that, when z n ∈ X (a compact set): where, after noting that s∈S (λ n s ) z n = 0 vanishes due to λ ∈ , we have that < ∞ can be chosen to hold regardless of the specific realisations of ρ n > 0 and z n ∈ X due to the boundedness properties of the SMIP Assumptions; the finiteness of also requires the assumed relatively complete recourse.
Proposition 14 Assume ψ satisfies Assumption 2. If (z, w) is a persistent limit for a persistent sequence {( z n , w n )} ∞ n=0 then 1. (z, w) ∈ F; namely z ∈ X and w s ∈ Y s (z); 2. there is a fixed neighbourhood B δ (z, w) with δ > 0 on which ( z n , w n ) is locally optimal for ϕ n for all n large enough (i.e., for all ρ n larger than some threshold ρ).
If we furthermore assume that the PWA Assumption 10 holds, then we have z n I = x n s,I for all s ∈ S for all ( x n , y n ) ∈ n ( z n , w n ) for n large enough.
Proof To prove (1), suppose (z, w) / ∈ F. Then there exists δ > 0 such that inf (x s ,y s )∈K s (z − x s , w s − y s ) 2 ≥ 2δ for at least one scenario s ∈ S. As ( z n , w n ) → (z, w) we have, for n (and thus ρ n ) large enough that, inf (x,y)∈K s ( z n − x s , w n s − y s ) 2 ≥ δ for some s ∈ S n (in which case z n = x n s since by Proposition 3 we have w n = y n ).We now use Assumption 2 (3) to bound the penalty values below.By the differentiability assumed in Assumption 2 we apply the inequality (5) for each s ∈ S to get ψ z n − x s , w n s − y s ≥ m 2 ( z n − x s , w n s − y s ) 2 .It follows that, as lim n→∞ ρ n = ∞, we have lim inf n π n s ≥ ξ so min which is unbounded.This contradicts the assumption that {ϕ n ( z n , w n )} ∞ n=0 is bounded above as required by the persistency assumption on { z n } ∞ n=0 .Thus (z, w) ∈ F. Having shown (z, w) ∈ F, it follows from definitions that w s ∈ Y s (z) and so claim (2) follows readily from Corollary 8 using the same critical ρ and δ that apply regardless of the choice of z ∈ X , and so it is established that (z, w) → ϕ n (z, w) is convex over B δ (z, w) for all ρ n > ρ with n sufficiently large.By Remark 4 we have the same neighbourhood associated with the local minimum at (z, w) also associated with a persistent local minimum at some (z, w) and thus B δ (z, w) serves as the fixed neighbourhood verifying local optimality of ( z n , w n ) for (z, w) → ϕ n (z, w) for each n large enough.The last claim follows from Proposition 12.
The following contains a version of the strong augmented duality result for augmented Lagrangian.Notice that this result is more general than those in [15], in that it allows for the consideration of an inexact penalty that may be differentiable everywhere.When we have pure integer variables we see that all feasible solutions are persistent and we have a stronger form of duality.

If problem SMIP (1) has pure integer variable in both stages and feasible point
(z, w) ∈ F satisfies lim sup n ϕ n (z, w) < +∞ for lim n→∞ ρ n = ∞, then (z, w) ∈ F is a local minimum of ϕ n for n large enough.2. For any sequence {( z n , w n )} ∞ n=0 of global minimisers of ϕ n with lim n→∞ ρ n = ∞ and λ n , n ≥ 0, satisfying Assumption 9(2), and π s > 0, s ∈ S, then its limit points (z, w) are globally optimal solutions to SMIP (1).That is, there exists at least one globally optimal solution (z, w) to SMIP (1) that is a persistent limit.Moreover, for any {ρ n } ∞ n=0 with ρ n → ∞ there exists a persistent local minimum sequence We have for any λ ∈ , π s > 0, s ∈ S, Moreover, for a pure integer SMIP a finite value of ρ > 0 exists for which min (z,w)∈X×Y |S| ϕ λ,ρ,π (z, w) = ζ SM I P for ρ ≥ ρ.
n=0 is a persistent sequence.Its limit points (z, w) thus satisfy (z, w) ∈ F by Proposition 14. Furthermore ( x n s , y n s ) ∈ n s ( z n , w n s ) and as ∞ (z, w) = {(z, w)} we have (after passing to the subsequence) lim n→∞ ( x n s , y n s ) = (z, w s ) ∈ K s .(For if not, the boundedness of {λ n } ∞ n=0 , ρ n → ∞, π s > 0, s ∈ S, and the minorisation (5) would imply that lim sup n→∞ ϕ n ( z n , w n ) = ∞.)Furthermore, since ϕ n ( z n , w n ) ≤ ζ SM I P , we must have also s∈S f s ( x n s , y n s ) + (λ n s ) x n s ≤ ζ SM I P and so s∈S f s (z, w s ) = ζ SM I P (due to the boundedness and dual feasibility λ n ∈ and x n s → z for all s ∈ S, we have lim sup n→∞ s∈S λ n s x n s = 0) and thus, (z, w) must be optimal for the original SMIP (1).
3).Denote When we have a pure integer SMIP then by Part 1, we have the existence of a ρ > 0 such that ϕ λ,ρ,π (z ρ , w ρ ) = ϕ λ,ρ,π (z, w) for all ρ ≥ ρ, where (z, w) is a global minimum of the SMIP.Hence, a global minimum is achieved for a finite ρ.
We now investigate the role of the fixed neighbourhoods verifying the local minima for ϕ n , n ≥ 0. Indeed, for limiting points that are not persistent, we show that such a neighbourhood does not exist.
n=0 satisfies the joint PHA Assumption 16.If the radii δ n , n ≥ 0, on which ( z n , w n ) are locally optimal for ϕ n satisfies lim inf n→∞ δ n = δ for some δ > 0, then 1. lim n→∞ z n − x n s = 0 for all s ∈ S for which lim sup n→∞ π n s > 0. Thus, z ∈ X, lim n→∞ s∈S π n s x n s → z and for n sufficiently large, we have z n I = x n s,I for all s ∈ S, and s∈S π s x n s ∈ X .(When ψ = 1 2 • 2 we have z n = s∈S π s x n s ∈ X for all n ≥ 0.) 2. For all n we have w n s = y n s ∈ Y s ( x n s ) and limit points w s of { w n s } ∞ n=0 satisfy w s ∈ Y s (z) for all s ∈ S, with lim sup n→∞ π s > 0.
Proof See Appendix A.
The previous analysis allows us to pose the following result which confirms that the "basis of attraction" of non-persistent local minima has no interior in the limit.The next result follows immediately as contra-positives of Proposition 17.

satisfy the joint PH Assumption 16. If any one of the following is true:
1. z / ∈ X, 2. There exist arbitrarily large n such that z n I = x n s,I for at least one s ∈ S, 3. lim n→∞ x n s = z or lim n→∞ x n s does not exist for at least one s ∈ S for which lim sup n→∞ π n > 0 and 4. when ψ = 1 2 • 2 , z n → z, then lim n→∞ δ n = 0 for the radii δ n > 0, n ≥ 0, on which the local optimality of each z n for z → inf w ϕ n (z, w) is verified.

Example 1 Consider an augmented Lagrangian reformulation of a simple split variable extensive form of a two-stage SMIP min
x,y,z,w (20) with penalty coefficient ρ > 0 where 2 , and λ = 0 throughout this example.For any {ρ n } ∞ n=0 with ρ n > 0, n ≥ 0, one may verify that the local minimisers z n , w n and local optimal values as parameterised by ρ n , n ≥ 0, are as follows The locally optimal solutions ( z n , w n ) are the same for all ρ > 0, so that ( z n , w n ) = (z, w) for n ≥ 0 for each locally optimal (z, w).Here we see that the two globally optimal solutions for ϕ λ,ρ,π are the persistent solutions with either z = x 1 = x 2 = 0 or z = x 1 = x 2 = 1, which both satisfy non-anticipativity.The non-persistent solution has z = 0.5 with 0 = x 1 = x 2 = 1; it only stays optimal over an ever shrinking neighbourhood B δ (z, w) with radius δ = 1/ρ n vanishing as ρ n → ∞.

Solution Value
Locally Optimal Over B δ (z, w)

Analysis of the block Gauss-Seidel sequence
Block Gauss-Seidel iterations are most easily analysed for differentiable optimisation problems.However, we need to perform Gauss-Seidel iterations on nonsmooth functions with varying parameterisations and, hence, we develop the necessary theory to facilitate this analysis.We start with statements of elementary definitions and properties of Gauss-Seidel iterations that apply under general assumptions on the function and their domain sets.

Definition 5 Let
For general non-smooth G, partial minimality does not imply (joint) minimality.Under suitable assumptions of convexity and (additive) separability of non-smoothness in G, we may recover joint minimality as described in Lemma 21

Assumption 19 Separability and Convexity Assumptions (SCA) on
1. G is bounded from below and its level sets are bounded.

On the stationarity of Gauss-Seidel limit points
Two views of framing the Gauss-Seidel step of Sect.2.1 now apparent are: 1) via continuous block z and block w partial minimisation updates of the continuous "regularisation function" ϕ λ,ρ,π , and 2) continuous consensus block z minimisation updates and mixed-integer block (x, y, w) minimisation updates applied directly to augmented Lagrangian reformulations of SMIP (1).The former approach still requires an analysis of the (x, y) update, but in a hidden form.On the other hand, the latter relies on the fact that the iterates will eventually fall into a region where the integer variable will become fixed in value, and thus subproblem optimisations are local and associated with continuous (and convex) parts of the problem.Motivated by the use of Lagrangian-and penalty-based solution approaches, we furthermore assume that G = G k varies across iterations k ≥ 0 subject to the following assumptions.

Assumption 22 Structural Assumptions (SA):
In Sect.5.2, we identify the sequences {G k , (z k , w k )} ∞ k=0 with a subsequence of GS (mid-)iterations associated with the application of Algorithm 1.For now, we deliberately detach the analysis of {G k , (z k , w k )} ∞ k=0 from its intended algorithmic identification.The convexity of Q k and h k in Assumptions 19 and 22 allows for {∂G k } ∞ k=0 to converge in graph [13,Theorem 12.35].This assumption will not prove restrictive in the integration of the present analysis with the convergence properties of Algorithm 1 even though the underlying problem has mixed-integer constraints.
Proof Given that 0 / ∈ ∂G(z, w), it follows from [13, Theorem 12.35(b)] that the sequence { ∂G k (z k , w k )} ∞ k=0 must be strictly bounded away from zero in that lim inf Assumption 24 Stationarity of w (w-stat) on the (sub)sequence indexed by k: Proof Under Assumption 24 (w-stat), we must have for the w subgradient components 0 ∈ {∇ w Q k (z, w) + ∂ w h k ( w)} = ∅ for k ≥ 0, and so the hypothesis 0 / ∈ ∂G(z, w) and the calculus rules of Lemma 20 and Lemma 23 imply the intended result.
For the following results, we introduce an Armijo descent step rule for the z step to aid in the convergence analysis.
The z-DA Assumption 26 itself makes no assumption on how the sequence {(z k , w k ), d k } ∞ k=0 is constructed.Subsequently stated identifications of the sequence {(z k , w k )} ∞ k=0 with subsequences generated by Algorithm 1 will guarantee the satisfaction of z-DA Assumption 26 under mild assumptions on the implementation of Algorithm 1.The Armijo Step of Algorithm 2 is not actually used in our implementation of Algorithm 1. Rather, it is merely a theoretical tool in what follows.
The proof of the following lemma is based on ideas from [24, Proposition 1.
Proof From ( 22), the satisfaction of w-stat Assumption 24, and Lemma 21, we only need to show that lim k→∞ ∇ z Q k (z k , w k ) = 0. Due to SA Assumption 22 and z-DA Assumption 26, we have We consider two cases: 1) lim sup k→∞ α k > 0, and 2) lim sup k→∞ α k = 0. Due to the assumed continuity of ∇ z Q k , and given that and so lim k→∞ ∇ z Q k (z k , w k ) = 0. Otherwise, assuming that lim sup k→∞ α k = lim k→∞ α k = 0, we have for some large enough k ≥ k ≥ 0 that α k ≤ β < 1 and so we have ᾱk = α k /β ≤ 1-that is, the state of α k at the penultimate iteration of Algorithm 2-for which it holds that Applying the Mean Value Theorem at each k, we have for some αk ∈ [0, ᾱk ].Using the continuity of ∇ z Q k (and the Cauchy-Schwartz inequality), we have for arbitrarily small > 0 that there exists δ < 0 where for large enough k, we have αk ≤ ᾱk < δ so that for sufficiently large k.In the limit, we have 0 which is a contradiction since (1 − σ ) > 0 and ∇ z Q(z, w) 2 > 0 as established from Lemma 25 and the SA Assumption 22. Thus, 0 ∈ ∂G(z, w), and since G is regular, then 0 ∈ ∂G(z, w) holds also.
In order to apply Lemma 27 to a convergence analysis of Algorithm 1, we need to establish the satisfaction of the SA Assumption 22, w-stat Assumption 24, and especially the z-DA Assumption 26 requiring lim k→∞ G k (z k +α k d k , w k )−G k (z k , w k ) = 0 given an appropriate identification with Algorithm 1 subsequence iterations {n k } ∞ k=0 .

Interleaving analysis and algorithm
We analyse subsequences {(x n k +1 , y n k +1 , w n k +1 , z n k )} ∞ k=0 from the iterations generated by the application of Algorithm 1 applied to problem (1) that converge to (x, y, w, z).(Such limit points with respect to the entire sequence in n exist due to infcompactness (1) that will be demonstrated in this subsection.)This analysis depends on establishing that the Sect.5.1 assumptions hold under the appropriate identifications with Algorithm 1 (i.e., the SA Assumption 22, the w-stat Assumption 24, and the z-DA Assumption 26).
Given the assumed subsequence convergence, we may take the subsequence {(x n k +1 , y n k +1 , w n k +1 , z n k )} ∞ k=0 so that integer component values for x I = x I and y I = y I are fixed.With respect to Algorithm 1, we apply the identifications given GS iterations indexed by n ≥ 1 and subsequence iterations indexed with n k , k ≥ 1: Assumption 28 Algorithm 1 Identifications: (17), define for each n ≥ 0 we have also where the last equality is by construction (fixed integral values) of the subsequence.
Furthermore, to guarantee the set of assumptions: SA, w-Stat, and z-DA, we assume the following of the GS (sub)sequences: Assumption 29 Algorithm Assumptions: In the application of Algorithm 1 to problem (1), the following hold: 1. SMIP assumptions: Assumption 1 holds for problem (1).2. Penalty function assumptions: ψ satisfies the Assumption 2. Furthermore, we subsequently note special implications that hold in the cases where ψ takes the weighted squared 2-norm form with weights μi > 0 such that 3. Global optimality: Each (x n+1 , y n+1 , w n+1 ) is globally optimal given fixed z n in that (x n+1 , y n+1 , w n+1 ) ∈ arg min x,y,w f n (x, y, w, z n ) (hence limit points (x, y, w) are globally optimal given fixed limit point z by known results [27, Propositions 1.3.5 and 1.3.6]).Also, z n+1 ∈ argmin z s∈S π n s ψ(z − x n+1 s , 0), n ≥ 1, are globally optimal.Furthermore, under the additional assumption that ψ is of the weighted squared 2-norm form (24), we have (independent of weights μi > 0, i = 1, . . ., n) 4. Generation of Lagrange multipliers: λ 0 s = 0 for s ∈ S. If ψ is not of the weighted squared 2-norm form (24), then we may assume λ n s ≡ 0, n ≥ 1, identically.Otherwise, if ψ is of the squared 2-norm form (24), For each subsequent iteration n, either λ n s is left unchanged via λ n+1 s ← λ n s for all s ∈ S, or for all s ∈ S.
Under these assumptions on λ, it follows that for n ≥ 0, we have vanishing sums s∈S λ n s = 0. (i.e., dual feasibility maintained and the absence of the s∈S λ n s z terms the Lagrangian is thus justified.)Non-trivial λ updates between iterations are suppressed as necessary to ensure in the limit that ∞ n=1 λ n s − λ n+1 s < ∞ hold.(In practice this will usually entail only a finite number of nontrivial updates.) 5. Update of penalty parameters: We assume the following.
(a) Penalty coefficients are nondecreasing does not hold in general.)(c) For each n ≥ 0 and s ∈ S, we have π n s > 0 and s∈S π n s = 1, and The algorithm does not necessarily adjust π n s and ρ n parameters separately.Instead, it may apply penalty updates in a scenario-specific manner to ρ n s := π n s ρ n with ρ n = s∈S ρ n s .Under the condition that s ∈ S n := {s ∈ S | z I = x s,I }, we place Lemma 30 Under the Algorithmic Identifications 28 with G k = h k + Q k and Assumption 29, we have Proof By regularity of G due to its convexity, the (limiting) Mordukhovich and Fréchet subdifferentials coincide.
We argue that we have epi-convergence of we have h k convex and converging both monotonically point-wise and uniformly to δ K (x I ,y I ) .This is because we have uniform convergence to zero of x s on the compact and convex polyhedral set K (x I ,y I ) .Thus we have epi-convergence of {h k } ∞ k=0 to δ K (x I ,y I ) .Whenever {(π n k s ) s∈S } ∞ k=0 converges to π s , for s ∈ S we then have a family of convex functions {Q k := s∈S π n k s ψ} ∞ k=0 , uniformly converging on compact sets and hence also epiconvergent to Q = s∈S π s ψ s (•, •).Applying [27, Theorem 7.1.5]or [13,Theorem 7.46], we know that the sum of a uniformly convergent sequence and an epi-convergent sequence must epi-converge.Thus we may deduce that we have epi-converges to G = h + Q.Now we may apply [13,Theorem 12.35] to deduce that the convex subdifferentials {∂ w G k } ∞ k=0 converge in graph to ∂ w G.As w k is a global minimiser for w → G k (z k , w), then we have (by definitions) that 0 ∈ ∂ w G k (z k , w k ) and hence also over the subsequence indexed by k.Thus by graphical convergence 0 ∈ ∂ w G(z, w).
Proof See Appendix A.

Corollary 32
Given n k , k ≥ 0, a subsequence index, and j k the positive integer such that n k Proof By the construction of the subsequence, δ

Lemma 33 Under Assumption 28 and Assumption 29, we have
Corollary 34 Assume the Algorithm Identifications 28.If the Algorithm Assumptions 29 hold, then z-DA Assumption 26 holds under any allowable realisation of its assumptions on β, σ , d k , etc. (Thus, the intended z-DA condition will hold for any convergent subsequence and α k computed with the Armijo rule for any β, σ ∈ (0, 1).) already holds per the Armijo step, the satisfaction of z-DA Assumption 26 follows from Lemma 33 and Corollary 32 once it is noted that lim k→∞ G k+1 (z k+1 , w k+1 ) − G k (z k , w k ) = 0 follows from the SA epi-convergence of Assumption 22 [13,Theorem 12.35].

Definition 7
Under the epi-convergence of SA Assumption 22, we define the limiting regularisation We now state one of our main results.Before doing so, we denote the following Definition 8 To accommodate both possibilities lim n→∞ ρ n = ρ < ∞ or lim n→∞ ρ n = ∞ disjunctively, we define (recalling the definition ( 23)).From ( 25 Proposition 35 Let (x, y, w, z) satisfy (x, y) ∈ K .The following implications hold.
Proof Part 1: Given 0 ∈ ∂ g * (x, y, w, z) and the structure of g * as the sum of a linear function and an indicator function for a polyhedral set with integer cross-sections, we have that g * (x, y, w, z) ≥ g * (x, y, w, z) for all (x, y) ∈ K (x I , y I ) and (z, w) ∈ X×Y and more particularly, g * (x, y, w, z) ≥ g * (x, y, w, z) for all (x, y) ∈ K (x I , y I ) Thus, Furthermore, since g * (x, y, w, z) ≥ g * (x, y, w, z) for all (x, y) ∈ K (x I , y I ) and (z, w) ∈ X × Y, we have The persistency follows from the fact that inequality (11) implies a bound independent of ρ.
Part 4: Knowing that (z, w) is a local minimum for φ ∞ , we form the cleared instance of the SMIP (1) by clearing first-and second-stage coefficients c = d = 0, and for all n ≥ 0, clearing λ n = 0 and setting π n = π .Thus, for all ρ > 0, we have that 1  ρ ϕ λ,ρ,π ≡ φ ∞ and so ( z n , w n ) ≡ (z, w), n ≥ 0, forms a sequence with limit (z, w) ≡ ( z n , w n ), n ≥ 0. Each ( z n , w n ) ≡ (z, w) is a local minimum for ϕ n over a fixed neighbourhood B δ (z, w) for some fixed δ = δ > 0, and since the SSA Assumption 9 and the PWA Assumption 10 therefore hold, by Proposition 17 applied to this sequence associated with this cleared instance of SMIP (1), we have z ∈ X and w ∈ Y (z), which applies with respect to the original (non-cleared) instance of SMIP (1) also.

Theorem 36 Assume that problem (1) satisfies the SMIP Assumption 1, to which Algo
If the Algorithm Assumption 29 is satisfied, then there exists a limit point (x, y, w, z) of the miditeration sequence {(x n+1 , y n+1 , w n+1 , z n )} ∞ n=0 , and each such limit point (x, y, w, z) is a Fréchet stationary point for the problem min z,x, y,w g * (x, y, w, z) (26) and in either limiting case, the cross-sectional optimality (z, w) ∈ arg min z,w φ * z, w | x I , y I holds.Thus, the following implications hold: for all s ∈ S (so that (x, y) is feasible and locally optimal for SMIP (1)), then (z, w) is a (persistent) local minimum of φ * .3. In the specific case where lim n→∞ ρ n = ∞ (so that φ * = φ ∞ ), the reverse of the previous implication also holds, where (z, w) being a local minimum of φ * implies that z = x s , s ∈ S, so that (x, y) is feasible and locally optimal for SMIP (1).
Proof Under the SMIP Assumption that K s , s ∈ S, are compact and penalty satisfies Assumptions 2, it follows that the level sets of g are compact, and so the sequence {(x n+1 , y n+1 , w n+1 , z n )} ∞ n=0 will have limit points (x, y, w, z) to which an associated subsequence {( R , y n k +1 R , w n k +1 R ) that are still changing throughout the (sub)sequence tail.After passing to a convergent subsequence with integer components fixed, the required SA Assumption 22 and the w-stat Assumption 24 applies to Assumption 28 identified with {(G k , z k , w k )} ∞ k=0 by Lemma 30.Under the same assumptions, the z-DA Assumption 26 is satisfied due to Corollary 34.Thus, Lemma 27 may be applied to the Assumption 28 identification sequence From Theorem 36 we know that the GS limit points (z, w) will be optimal for at least one cross-section ϕ λ,ρ,π •, • | x I , y I or φ ∞ •, • | x I , y I .
A simple example demonstrates the possibility that the above GS procedure produces a limit point (z, w) for which ∂φ ∞ (z, w) is empty.

Example 2
We revisit a rescaled version of the augmented Lagrangian problem of (20) defined for Example 1, where the objective function is rescaled by a factor of 1 ρ .Of note is the locally optimal solution ( z n , w n ) = 1  2 , [0, 0] T , which for 0 < ρ n < ∞ is clearly a local minimum for ϕ n over |z − z| < 1/ρ n .Furthermore for ρ n < ∞, ∂ϕ n ( z n , w n ) = 0, [0, 0] T is non-empty.However, in the limit as ρ → ∞, we have clearly that (z, w) = 1  2 , [0, 0] T realises the value φ ∞ (z, w) = 1 4 over all cross-sections, but ∂ z φ ∞ z, w | x I , y I ⊃ {0} for two of the four cross sections, and furthermore their intersection (x I , y I ) ∂ z φ ∞ z, w | x I , y I = ∅ is empty and so ∂φ ∞ (z, w) = ∅.Also note that while this (z, w) is a partial minimum for φ ∞ , it is not (even) a local minimum over (z, w) jointly.We have demonstrated through example (but never observed in our experiments) a pathological case where a partial minimum is encountered in the limit, but local optimality or feasibility for SMIP is not achieved.This lack of local optimality is due to a partial minimum being found for φ * where Fréchet subdifferentiability fails, a problem foreshadowed in Lemma 21 (indeed the existence of a non-empty subdifferential ensures stationarity from which local optimality follows, Lemma 11).This failure of subdifferentiability occurs only when our solution is minimising some (not all) of active sections defining φ * , see Lemma 11.Furthermore, for any solution to problem (26) that satisfies consensus, this lack of Fréchet differentiability is ruled out, see Theorem 36 (2) and we are then assured of obtaining a persistent local minimum.
A partial converse of this may be found in Proposition 12 where integer consensus is ensured for a persistent minimum.Such pathological limit points are unstable in the sense that they are mere partial minima but not even locally minimal jointly in (z, w) for φ * .Consequently, an apt minor perturbation of (z, w) (suggested by Corollary 18) may be employed to get the iterative FPPH approach unstuck.
6 Computational results

Algorithm
Algorithm 3 presents a modified version of Algorithm 1, in which we explicitly consider the initialisation steps and the rules for updating Lagrange and penalty parameters between successive iterations.We use an algorithm formulation consistent with the typical presentation of Progressive Hedging but allowing for differences in how (and when) Lagrange and penalty parameters are updated.Also, since the second-stage discrepancies are always zero in the context of a two-stage SMIP, we omit the second component of the penalty function (the v component in Assumption 2).The required properties of the penalty function specified in the ICRF Assumption 2 gives us flexibility in choosing ψ.As described next, we compute weights for a weighted squared 2-norm form of the penalty function ψ during the initialisation with the aim of accelerating convergence to a reasonably high-quality feasible solution.Subsequently, we describe update schemes and the termination condition used in our computational experiments.

Initialisation
The penalty function weights in determining the weighted squared 2-norm penalty function ψ are denoted μi , i = 1, . . ., n, which do not change between iterations n ≥ 0. The iteration n ≥ 0 penalty coefficients ρ n s = π n s ρ n denote the weighting of the iteration n penalty magnitude ρ n by penalty weight π n s .Initially, ρ 0 s = π 0 s = p s .Once the Penalt yU pdateCondition is satisfied, the ρ s terms are increased by the Penalt yU pdate function to modify the penalty applied to each scenario.These functions are defined in Sects.6.1.2and 6.1.3.The initial z 0 of Algorithm 3 Line 3 may be computed by z 0 i = s∈S p s x 0 s,i for all i ∈ 1, . . ., n.Having z 0 , the values μ i for i = 1, . . ., n which are required to form the penalty weights for initialising the penalty function ψ are computed as given in [11] for applying Progressive Hedging to SMIPs: end if 25: end while 26: return (z n , w n ) when variable i is continuous, and for each i ∈ 1, . . ., n when variable i is discrete.
We employ a slightly modified version of this scheme, where penalty parameters that would be set to zero by this rule are instead set to the smallest non-zero value μ among all other penalty parameters μ i , i = 1, . . ., n.We denote the modified penalty function weights as μi := max{μ i , μ } for each i = 1, . . ., n.This modification did not materially affect the performance of Progressive Hedging and provides a guarantee that in the penalty-updating algorithms, all penalties can grow to be arbitrarily large, as required by Assumption 10.
This choice of penalty initialisation has been made to allow as direct a comparison as possible between Progressive Hedging (using a set of parameters established to be reasonable for that algorithm) and the penalty-updating variations of Algorithm 3. Having computed μi for all i = 1, . . ., n, we set the penalty function for our computational experiments as ψ(u) = One may verify that this dual multiplier update maintains the feasibility condition s∈S λ n s = 0 for each n ≥ 0.

Penalty update condition
We consider three update-type conditions: 1. Penalt yU pdateCondition always returns False, meaning that the algorithm performs dual updates at every iteration and never increases the penalty parameters.This is equivalent to the Progressive Hedging algorithm for SMIP.This update condition does not satisfy Assumption 10, since the penalty parameters do not become arbitrarily large.2. Penalt yU pdateCondition always returns True, meaning that the algorithm does not update the dual variables again after the initialisation step and instead increases the penalty parameters.This is designated as the Penalty Only variant of FPPH.

Track the degree of change in the dual variables
is satisfied, Penalt yU pdateCondition returns True so that no further dual updates are performed; otherwise it returns False.We set the parameters β and γ to 0.5 and 10 −3 respectively.This is designated as the Dual Step Length variant of FPPH.
As a simple guarantee that the Dual Step Length method satisfies Assumption 10 we could specify a specific number of iterations after which Penalt yU pdateCondition must return True.However, in our computational tests with this update condition either (27) or integer-variable consensus was always satisfied after a reasonably small number of iterations.

Penalty update scheme
We gradually increase the penalty parameter for the scenario whose first-stage variables are furthest from consensus with the following method.For each scenario s ∈ S, we calculate its distance from consensus D n s = z n − x n s 2 .Then, update the penalty multipliers as follows: We set the parameter α to 0.1.This rule is intended to prioritise increasing the penalty parameters corresponding to the scenarios whose first-stage variables are furthest from consensus.Assuming that Penalt yU pdateCondition returns True after a finite number of iterations, this update scheme satisfies Assumption 10.

Termination condition and n max
Termination of each computational test is conditioned on attaining consensus z n I = x s,I , for all s ∈ S in all integer variables.For the instances with pure integer first-stage variables, this condition is the same as requiring first-stage consensus.For the instances with mixed integer first-stage variables, we generally do not have full consensus in the continuous variables at this point.To obtain feasible solutions, we take each unique first-stage solution x s and find the corresponding optimal second-stage decisions y.We then report the best solution value found among these candidate solutions.
Our motivation for applying this convergence criterion to mixed integer first-stage instances is that when allowed to run beyond achieving integer consensus, the FPPH variants typically satisfied the convergence criterion s∈S p s x n s − z n 2 2 < 10 −3 within 100 iterations but with very poor solution quality, whereas PH failed to satisfy this criterion given even 200 iterations.Any potential method for finding a high-quality solution quickly given a fixed value for the first-stage integer variables could be applied to the solutions produced by both PH and FPPH, but implementing and tuning such a method is outside the scope of this paper.
We set n max = 100 since both FPPH variants generally converge well within this iteration limit, and in cases where PH does not it is already clearly slower than FPPH in terms of both runtime and iteration count.

Computation environment
The experiments in this section were conducted with a C++ implementation of Algorithm 3 using CPLEX 22.1 [28] as the solver.For reading SMPS files into scenario-specific subproblems and for their interface with CPLEX, we used modified versions of the COIN-OR [29] Smi and Osi libraries to instantiate appropriate C++ class instances of the subproblems directly.
The computing environment is the Gadi cluster maintained by Australia's National Computing Infrastructure (NCI) and supported by the Australian government [30].To The PH and FPPH algorithms are deterministic in terms of the solutions produced, but the time required for CPLEX to solve the subproblems at each iteration has some variation.Therefore, for each test, we ran each algorithm three times on each instance and report the average runtime.

Computational experiments: Pure integer first-stage instances
We first consider the application of FPPH and our implementation of Progressive Hedging to the CAP instance set [31] using the first 250 scenarios for each instance, and the SSLP instance set [32].To evaluate algorithm performance we compare it to the known IP optimal solution.To obtain the integer feasible optimal solutions for the CAP instances we used CPLEX to directly solve the MIP reformulation of each instance.The integer feasible optimal solutions for the SSLP instances are provided by SIPLIB [9].
The computational results are summarised in Fig. 1.These figures compare both the wall-clock time required for convergence (compared to the slowest algorithm to achieve convergence) and the quality of the feasible solutions obtained at termination.A more detailed summary of our results, including absolute runtime and solution values, is provided in the supplementary material (Table B1).
When applied to the SSLP instances, all three algorithms typically find the same solution, and it is often optimal.The Dual Step Length variant of FPPH outperforms PH in terms of runtime for all instances except for SSLP-15-45-5, where they require an When applied to the CAP instances, PH fails to converge to a feasible solution within 100 iterations for four of the eight instances and is again consistently outperformed in terms of runtime and matched in solution quality by the Dual Step Length variant of FPPH even when it does converge.There is not a clear favourite between the Penalty Only and Dual Step Length variants when applied to the CAP instances; each variant finds a higher-quality solution than the other variant for at least one instance, and converges faster than the other variant for several instances.

Computation experiments: Mixed-integer first-stage instances
We also compared the performance of FPPH and our implementation of Progressive Hedging applied to the DCAP instance set [33,34].In this case, we compare with the known upper bounds given by SIPLIB [9].These results are summarised in Fig. 2, with further detail in the supplementary material (Table B2).
For these instances, PH consistently obtains consensus in the integer variables within 100 iterations and generally outperforms the Dual Step Length variant of FPPH in terms of runtime and solution quality.The Penalty Only variant of FPPH obtains better solution quality than PH when applied to DCAP342 (with 200, 300 and 500 scenarios) but finds considerably worse solutions when applied to the other DCAP instances.

A.1 Proof of Proposition 4
Proof The proofs of all four claims are the same for each s ∈ S. The first three claims are obvious from the compactness of the constraint sets K s , the coercivity of ψ and its other properties as implied by Assumptions 2, and the role of ρ in the objective function (x s , y s , w s , z) → f s (x s , y s ) + λ s x s + ρπ s ψ(z − x s , w s − y s ).To show claim (4), let (z 1 , w for (z, w s ), (z 1 , w 1 s ) ∈ K s , thus establishing the Lipschitz modulus π s ρ L .It readily follows by summation that ϕ λ,ρ,π is Lipschitz continuous with modulus ρ L for (z, w s ), (z 1 , w 1 s ) ∈ K s , s ∈ S.

A.3 Proof of proposition 17
Proof Proof of 1: Assume for sake of contradiction that lim sup n→∞ z n − x n s > 0 for at least one s ∈ S. As z n → z, δ > 0, and the boundedness of X , so there exists a fixed 0 < η < 1 such that for all sufficiently large n ≥ 0 and any x n ∈ X we have z η,n := η z n + (1 − η)x n ∈ B δ ( z n ) (indeed there exists a fixed κ > 0 such that κ B δ ( z n ) ⊇ X for n large).
For any given n, there exists s ∈ arg min s∈S ψ(η( z n − x n s ), 0).Set x n := x n s ∈ X .Then We used the fact that the gradient descent term ∇ψ z (η( z n − x n s ), 0), (1−η)( z n − x n s ) of the ICRF strong convexity assumption 2(3) is guaranteed to be positive due to the convexity of ψ and the strict increasing property in ICRF assumption 2 (2) The last inequality follows from the assumptions that lim sup n→∞ z n − x n s > 0 for at least one s ∈ S and the assumption that the corresponding π n s values are bounded away from zero as due to the PWA assumption 10(2).But since z η,n ∈ B δ (z), for n large enough, we have a contradiction of the local optimality of z n for z → inf w ϕ n (z, w) for some n arbitrarily large.Therefore, lim n→∞ z − x s = 0 for all s ∈ S, as long as π n s → 0 as in the PWA assumptions 10(2), and hence x n s → z.In particular s∈S π n s x n s → z.Taking into account the definition of S n we have x n s,I = z n I for all s ∈ S and n sufficiently large.(When ψ = 1 2 • 2 we have from Lemma 13 that z n ∈ argmin z 1 2 s∈S z − x n s 2 so z n = s∈S π s x n s .Moreover, in general s∈S π s x n s ∈ X for n large enough due to the fact that all x n s ∈ X , z n I = x n s,I for all s ∈ S and s∈S π s = 1.) Proof of 2: By Proposition 3 have the first assertion of 4. holding.The second follows from the closed graph of x s → Y s (x s ).

Proof 1 )
: Suppose (z, w) ∈ F. Using Lemma 7 and Corollary 8 we have a locally convex function(z, w) → ϕ n (z, w) = ϕ n (z, w | z I , w I )for all (z, w) ∈ B δ (z, w) for some fixed δ > δ > 0 and ρ n > ρ > 0 with n large enough.Moreover for all (z , w ) ∈ F with (zI , w I ) = (z I , w I ) we have ϕ n (z, w | z I , w I ) > ϕ n (z, w | z I , w I ) for (z, w) ∈ B δ (z, w) for some fixed δ > δ > 0 and ρ n > ρ > 0 with n large enough.Supposing (z, w) ∈ F is pure integer, then we have (z I , w I ) = (z, w) and hence (z, w) is a local minimum of ϕ n with ϕ n (z, w) ≤ s∈S c z + d s w s < +∞, due to the boundedness assumptions for the SMIP (and dual feasibility of any sequence {λ w) is a persistent limit, Lemma 14 implies (z, w) ∈ F and by Proposition 11 ϕ λ,ρ,π (z, w) ≤ s∈S p s c z + d s w s .It follows that: z + d s w s | (z, w) are persistent limits ≤ ζ SM I P , where the last inequality follows from the existence of global solutions that are limits of persistent local minima.Let ρ n → ∞ and {( z n , w n )} ∞ n=0 be a sequence of persistent local minima, globally minimising ϕ n with ϕ n ( z n , w n ) → ζ SM I P .Via global optimality ϕ n ( z n , w n ) ≤ ξ SM I P ρ n , from which it follows that sup ρ>0 ξ SM I P ρ = ζ SM I P .As all global minima are eventually persistent we are finished.

2 .
G has the form G(z, w) = Q(z, w) + h(w) where (a) Q : X × Y → R is convex and continuously differentiable over X × Y; (b) h : Y → R +∞ isproper, lower semicontinuous, and convex.The following properties follow immediately from Assumption 19.Lemma 20 Let G : X × Y → R +∞ satisfy SCA given in Assumption 19.
y I and so by the convexity of φ * •, • | x I , y I , we have also that (0, 0) ∈ ∂φ * z, w | x I , y I .Part 2: We have 0) ∈ ∂ϕ * w x I , y I ) (in both the Fréchet and classical sense) due to (z, w) ∈ arg min z,w φ * (z, w | x I , y I ) and as φ * (z, w) = min (x I , y I )∈proj I (K ) φ * (z, w | x I , y I ) where each (z, w) → φ * (z, w | x I , y I ) is convex, we may invoke Lemma 11 Part 2 to obtain both directions of the implication after identifying ϕ with φ * and ϕ i , i ∈ I , with φ * •, • | x I , y I for (x I , y I ) ∈ proj I ( * (z, w)).Part 3: The fact that z = x s for all s ∈ S implies by Corollary 8 that φ * (z, w) = φ * z, w | x I , y I for just one (x I , y I ) = (z I , w I ) only, and so the claim follows.
For k large enough, we have (x n k +1 I , y n k +1 I , w n k +1 I ) = (x I , y I , w I ) becoming fixed.Therefore, it is only the z n k and the real-valued components (x n k +1 establish the stationarity properties which, after dereferencing the identifications back to the Algorithm 1 context, yield intended results.The cross-sectional optimality (z, w) ∈ arg min z,w φ * z, w | x I , y I holds by Proposition 35 Part 1.The proof of the three implications follows, respectively, from implications 2-4 of Proposition 35.

Fig. 1
Fig. 1 Comparison between our implementation of Progressive Hedging and variants of FPPH, applied to instances with pure integer first stage (SSLP and CAP).Bar height indicates the time required for convergence compared to the slowest converging algorithm.Solid bars indicate the best quality solution found among the three algorithms.Tinted bars indicate convergence to a lower-quality solution.Suboptimal solutions are indicated by a percentage optimality gap.A solid bar with no percentage gap indicates the optimal solution was found.Empty bars indicate non-convergence within 100 iterations; the arrow signifies that these 100 iterations took much longer than the slowest converging algorithm

ρ n 2 s∈S π n s z n − x n s 2 ≤
(x s − x s ) + d s (y − y )] | (x, y s ), (x , y s ) ∈ K s , s ∈ S},which is guaranteed to be finite due to the boundedness of K s , s ∈ S and the PWA assumption 10 boundedness of {λ n } ∞ n=0 .We have for each n ≥ 0 and ρ n sufficiently large that infy∈Y s (x n ) {d y s } + ρ n π n s ψ(z η,n − x n , 0) + inf y∈Y s (x n ) {d y s }] + ρ n π n s ψ( z n − x n s , 0) − m(1 − η) 2 s∈S inf w {ϕ n s ( z n , w s )} + M − m(1 − η) 2 ρ n 2 s∈S π n s z n − x n s 2 < s∈S inf w ϕ n s ( z n , w s ).