Partially Distributed Outer Approximation

This paper presents a novel partially distributed outer approximation algorithm, named PaDOA, for solving a class of structured mixed integer convex programming (MICP) problems to global optimality. The proposed scheme uses an iterative outer approximation method for coupled mixed integer optimization problems with separable convex objective functions, affine coupling constraints, and compact domain. PaDOA proceeds by alternating between solving large-scale structured mixed-integer linear programming problems and partially decoupled mixed-integer nonlinear programming subproblems that comprise much fewer integer variables. We establish conditions under which PaDOA converges to global minimizers after a finite number of iterations and verify these properties with an application to thermostatically controlled loads.

design of multi-product batch plants [41], to obstacle avoidance and robotic motion planning problems [31].
Although MICPs are NP-hard in general, there exist a variety of algorithms for solving MICPs to global optimality [7]. State-of-the-art MICP solvers are based on tailored methods that exploit the fact that the integrality constraints are discrete while all other constraints are convex. Early attempts to develop tailored branch & bound methods for MICP have been proposed in [22], mostly focussing on computational experiments and heuristics for selecting the branching variables and nodes. Improved versions of these early branch & bound methods for MICP can be found in [8,32]. Other early methods for solving MICP include generalized Benders decomposition methods [4,20], which are, however, less frequently used in state-of-the-art MICP solvers. 1 Modern MICP implementations are often, in one or the other way, based on or related to outer approximation (OA), which goes back to Duran and Grossmann [14]. In contrast to branch & bound, OA alternates between solving nonlinear programs (NLP) with fixed integer values as well as mixed integer linear programs (MILP), which are constructed by linearizing the objective and constraint functions at the solutions of the NLP and which are used to update the integer variables. A notable extension of OA has been developed by Fletcher and Leyffer [17], who suggest to include curvature information in the relaxed integer program leading to a quadratic outer approximation method. Moreover, Kesavan and co-workers [26] have studied variants of OA for solving non-convex mixed integer problems. Another class of MICP methods are based on (extended) cutting plane methods [50] or combination of OA and branchand-cut [40]; see also [47] for a general overview of polyhedral branch-and-cut methods. In recent years, there has been considerable progress in lift-andproject methods for MICP. An excellent overview and discussion of the stateof-the-art of such lift-and-project methods can be found in a recent article by Kilinç, Linderoth, and Luedtke [27].
Another recent trend in MICP solver development is the exploitation of separable structures by so-called extended formulations [23]. Here, the main idea is to introduce auxiliary variables in order to bound decoupled summands in additive expressions separately [49], which can lead to tighter polyhedral outer approximations. Such extended formulations have not only found their way into OA methods, as discussed in [23], but they can also be used to increase the performance of lift-and-project methods for MICP [27]. However, extended formulations exploit the separability for the construction of tighter outer approximations only, but neither existing OA methods nor state-of-the-art liftand-project methods ever attempt to break a large-scale MICP into decoupled MICPs with fewer integer variables. This is in contrast to distributed continuous convex optimization methods, such as dual decomposition [16,36], alternating direction method of multipliers (ADMM) [9,15,18], or augmented lagrangian based alternating direction inexact newton (ALADIN) methods [24], which can all be used to solve large-scale convex optimization problems to global optimality by alternating between solving small-scale convex optimization problems and sparse linear algebra operations. These methods typically require communication of the solutions of the decoupled problems between neighbors or to a central coordinator [9]. Although some researchers, [46,34], have attempted to apply these distributed local optimization methods in a heuristic manner, these methods cannot find global minimizers of non-convex problems reliably. This is due to the fact that ADMM, ALADIN, or similar distributed convex optimization method typically rely on strong duality results for augmented Lagrangians [45,43], which fail to hold in the presence of integrality constraints.
After reviewing Hijazi's extended formulations and related existing outer approximation methods in Section 2, the main contribution of this paper is presented in Section 3, which introduces a partially distributed outer approximation (PaDOA) method for MICPs with separable objective functions. In contrast to existing algorithm for structured MICP, PaDOA alternates between solving MICPs with fewer integer variables and large-scale MILPs for which efficient algorithms exist. Section 3.5 discusses the global convergence properties of PaDOA, as summarized in Theorem 3. In this context, we additionally establish the fact that global optimality of a given feasible point of an MICP with N separable objectives and N n optimization variables can be computationally verified by solving N partially-decoupled MICPs, each comprising at most n local integer variables, and one MILP with N n integer variables. This result is summarized in Theorem 2, which analyzes one-step convergence conditions for PaDOA. In the sense that both MICPs as well as MILPs are NP hard in general [19,35], this result is not in conflict with existing complexity results for mixed integer optimization problems. However, there are solvers such as CPLEX [25], Gurobi [38], and many others [11], which can solve practical MILPs within reasonable computational run-times. Thus, the fact that one can reduce the task of verifying global optimality of a feasible point of a separable MICP with coupled affine constraints to the task of solving one MILP of a comparable size and several smaller subproblems, isat least from a computational perspective-an important contribution. Last but not least, Section 4 illustrates the practical performance of PaDOA by applying the algorithm to MICP benchmark case studies. Section 5 concludes the paper.

Problem formulation
The present paper is concerned with mixed integer optimization problems of the form and separable constraint sets The coupling matrix A and the vector b are assumed to be given. In this context, the following blanket assumption is used.

Assumption 1
The sets X 1 , X 2 , . . . , X N ⊆ R n are non-empty convex polytopes, the sets Z 1 , Z 2 , . . . , Z N ⊆ Z m are non-empty and compact, and the functions f i are convex on the convex hull of X i × Z i .
The goal of this paper is to develop an efficient algorithm that finds εsuboptimal points of (1), which are defined as follows: Remark 1 Instead of (1), one could also consider more general optimization problems of the form However, under mild regularity assumptions [37], this problem is equivalent to with real-valued auxiliary variables y and L 1 -penalty parameters λ i 0. Here, conv (Z) denotes the convex hull of Z. Thus, for all theoretical purposes, it is sufficient to analyze problems of the form (1), where only the real-valued variables are coupled.
Remark 2 Modern MICP formulations, algorithms, and software can deal with rather general convex conic constraints [33]. Such constraints are left out for simplicity of presentation. Nevertheless all results in this paper can be easily extended for general conic constraints, as long as they are separable. From a purely theoretical perspective, one might argue that this can always be achieved by adding suitable convex penalty functions to the functions f i , because this paper makes no assumptions on the differentiability properties of f . However, more tailored, practical algorithms that could exploit the structures of particular conic constraints are beyond the scope of this paper.

Notation
We use the notation to denote the set of subgradients of a convex function g : R n → R with respect to the variable x.

Polyhedral relaxations
In order to construct polyhedral outer approximations of the epigraph of the objective function f of (1), we consider the auxiliary optimization problem for a fixed integer parameter z ∈ Z. Here, x and y are real valued primal optimization variables and the notation "x = y | λ" is used to say that λ denotes the dual variable that is associated with the constraint x = y.
Proposition 1 If Assumption 1 is satisfied, strong duality holds for (5), i.e., we have for all z ∈ Z.
Proof See Appendix A.1.
Let x (z), y (z), λ (z) denote any primal-dual solution of (5) in dependence on z. By writing out the stationarity condition of (5) with respect to x, we find that i.e., λ (z) must be a subgradient of f at the optimal solution of (5). In order to understand the developments given below it is helpful to keep in mind that the reverse statement is not correct, i.e., a subgradient of f at (x (z), z) is not necessarily a dual solution of (5). In contrast to the particular choice of the subgradient λ of f with respect to x, the construction of a subgradient of f with respect to z is less critical for the construction of outer approximation methods. In the following, we assume that a function µ (z) ∈ ∂ z f (x (z), z) , is given, which returns a subgradient of f with respect to z at the optimal solution of (5). Because f (x, z) = N i=1 f (x i , z i ) is separable, the i-th block components, λ i (z) and µ i (z), of the subgradients of f are subgradients of f i . Thus, the inequality holds for all x i ∈ X i , and z i ∈ Z i and allẑ ∈ Z. Here, the shorthand is used. In this context, it is important to notice that the function f i (ẑ) depends on the whole vectorẑ, not only on its i-th component,ẑ i , because the equality constraints in (6) introduce a non-trivial coupling. More generally, if Ξ ⊆ Z i denotes finite set of points in Z, we associate with Ξ a set of hyperplane coefficients Notice that this set of hyperplane coefficients defines a polyhedral outer approximation of the epigraph of f i . Thus, these coefficients can be used to construct a piecewise affine lower bound on f i , which is for all ( Finally, we can construct the function Φ(x, z, . This function is-by construction-a piecewise affine lower bound on f , Next, our particular choice of the subgradient λ (z) of f as the dual solution of (5) enables us to establish the following tightness property of the affine lower bound Φ.
Lemma 1 Let Ξ ⊆ Z be any finite set of points. If Assumption 1 holds, then the equation holds for all z ∈ Ξ.
Proof See Appendix A.2.

Remark 3
If the set Ξ consists of m points, the computational cost for constructing the lower bound (9) of f has order O(mN ). Notice that if we would have ignored the separable structure of f , the computational cost of computing the same lower bounding function would have been of order O(m N ). Thus, the construction of (9) as a sum of the lower bounds of the separable objective function is much cheaper than a direct construction of lower bounds of f . This reduction in complexity has for the first time been observed and exploited by Hijazi and co-workers [23]. By now, the expoitation of separability via extended formulations can be considered as a standard that has been adopted in many modern MICP algorithms and software tools [27,33].

Outer approximation algorithm
Algorithm 1 outlines the main steps of the outer approximation algorithm. Notice that this algorithm basically coincides with the original outer approximation algorithm that has been proposed in [14]. The only notable differences of Algorithm 1 compared to traditional OA are that the MILP in Step 3 uses the extended formulation based outer approximation variant from [23]. Moreover, because we do not assume that f is differentiable, we have to use the particular choice, λ (z), of the subgradient, which is found as the dual solution 2 of (5). Notice that Step 1 of Algorithm 1 solves (1) under the additional constraint that the integer z is fixed. This implies that is an upper bound on the optimal objective value V of (1). Thus, the current upper bound U can be updated in Step 2. Moreover, the MILP (21) is (by construction) equivalent to solving the relaxed optimization problem 3 min x∈X,z∈Z is a lower bound on V . Thus, the difference, U − N i=1 y + i , between the current upper and lower bounds can be used as a termination criterion, which is implemented in Step 3 of Algorithm 1. The following finite termination result for outer approximation is (at least in very similar versions) well-known in the literature [14,33].

Algorithm 1: Outer Approximation for MICP
Input: Initial guess z ∈ Z and a numerical tolerance ε > 0.

Repeat:
1. Solve the convex optimization problem 2. If (11) has no feasible solution, return a certificate of infeasibility. Otherwise, update Update z ← z + and go to Step 1.

Theorem 1 If Assumption 1 is satisfied, then Algorithm 1 terminates after a finite number of iterations.
Proof See Appendix A.3. [23] for constructing the MILPs (21), which arguably exploit separability of the objective function to some extent. However, Algorithm 1 is not a fully distributed algorithm. In fact, a major disadvantage of Algorithm 1 becomes apparent, if one considers the special case that the constraint Ax = b happens to be redundant. In this case, the optimal solution of (1) could have been found with much less effort by solving the separable MICPs,

Remark 4 Algorithm 1 uses Hijazi's extended formulation
which have much fewer integer variables. However, if Algorithm 1 is applied to such a problem with redundant equality constraint, this property is not detected and a large number of large-scale NLPs and large scale MILPs might have to be solved instead, until convergence is achieved. The goal of this paper is to mitigate this deficiency of Algorithm 1 by proposing a partially distributed outer approximation algorithm that exploits the structure of the separable objective in a better way.

Partially distributed outer approximation algorithm
This section introduces a partially distributed outer approximation optimization algorithm for finding ε-suboptimal solutions of (1).

Partially decoupled upper bounds
The main idea of many distributed convex and local optimization methods is to solve a set of smaller-scale decoupled optimization problems in place of a single large one [4,9,12]. Similarly, consider partially decoupled optimization problems of the form for k ∈ {1, . . . , N }. Here, the integer variable z ∈ Z is regarded as a fixed parameter and only the much smaller dimensional integer vector ζ k ∈ Z k is optimized. However, concerning the real-valued variables, the whole vector x ∈ X is kept as an optimization variable. In this context, the shorthands are introduced in order to keep the x-dependence of the remaining summands, i.e., all objective terms whose index is not equal to k. As in the previous section, λ denotes the dual solution that is associated with the consensus constraint "x = y". Because the constraint ζ k ∈ Z k enforces integrality, strong duality of (13) does not hold in general. However, if ζ k (z) denotes an optimal solution of (13) for the integer variable and if Assumption 1 holds, we still have The proof of this statement is completely analogous to Proposition 1, i.e., if the linear coupling constraint Ax = b has a solution in X, a maximizer of (15) exists and can be used to define a suitable subgradient. Also note that the functions V k yield upper bounds on the objective value of (1), At this point, it should be mentioned that one basic assumption of the algorithmic developments in this paper is that the complexity of the mixed integer optimization problems of interest depends mostly on the number of integer variables. This is in contrast to the number of real-valued variables, which may be assumed to have a negligible influence on the overall complexity of the mixed-integer optimization problem. In other words, we assume that (13) is much easier to solve than (1) in the sense that it contains much fewer integer variables, although both problems have the same number of real-valued variables. Here, it is important to keep in mind that, although the algorithmic developments in this paper are inspired by the field of distributed optimization, the algorithm in this paper is (at least in the form in which we present and analyze it) not fully distributed. This is because solving (13) requires the evaluation of the function Ψ k , which, in turn, requires the evaluation of all functions f j with j = k.

Partially decoupled lower bounds
In this paper, we suggest to solve the decoupled MICPs (13) by lower level solvers that implement the traditional outer approximation algorithm that has been reviewed in Section 2.2. Notice that if Assumption 1 is satisfied, strong duality holds, i.e., these lower level solvers will return piecewise affine models which must satisfy the condition upon termination. Here, L ≥ 0 denotes the numerical tolerance of the lower level OA solvers. Notice that the optimization problem on the right hand of (17) corresponds to the last MILP relaxation that is solved by the lower level OA solver. In practice the function Θ k can be stored by maintaining a set of hyperplane coefficients as explained in detail in the previous section. Moreover, in order to avoid the accumulation of too many hyperplanes, one can discard all hyperplanes that are inactive at the optimal solution of the last MILP relaxation, because this operation does not affect the right hand expression of (17). The main idea of partially distributed outer approximation is to communicate the piecewise lower bounding functions Θ k to a central coordinator, who constructs a piecewise affine lower bound on the function f , solves a master MILP problem, and updates z. Here, one option is use the maximum over the function Θ k in order to obtain the lower bound However, in order to arrive at a practical implementation, it is recommendable to further refine this bound. This can be done by maintaining a collection of integers, Π ⊆ Z, such that the function can be used as a piecewise affine lower bound on f . Recall that the function Φ, which has been introduced in the previous section, exploits the separability properties of f . The integer collection Π is then maintained by updating where ζ = [ζ 1 , ζ 2 , . . . , ζ N ] is an integer vector, whose components are optimal solutions for the integer variables of the partially decoupled problems (13).

Partially distributed outer approximation (PaDOA)
Algorithm 2 outlines a partially distributed algorithms for solving (1). There are four main steps. In the first step, the partially decoupled MICPs of the form (13) are solved by using a traditional outer approximation method. Under the assumption that the original MICP (1) is feasible, the partially decoupled MICPs are feasible, too. Thus, the outer approximation solvers will return optimal integer solutions ζ k and associated piecewise affine lower bounds Θ k such that (17) is satisfied. The second step of Algorithm 2 updates the associated upper bound U based on the inequality (16) as well as the piecewise affine lower bound. In practice, this step is implemented by storing the union of all supporting hyperplane coefficients that are needed to represent Θ. Finally, the third step of Algorithm 2 solves a large scale MILP problem. This MILP is constructed in analogy to the corresponding step in the traditional outer approximation algorithm. It yields a lower bound, on the objective value V of (1). Thus, the difference between the current upper and lower bounds, can be used as a termination criterion, which is implemented in the fourth step of Algorithm 2. If the termination is not successful, the integer variables z are updated, and the algorithm subsequently proceeds to the next iteration. Notice that the main difference between Algorithms 1 and 2 is the introduction of partially decoupled MICP problems that can be solved separately Input: Initial guess z ∈ Z and a numerical tolerance ε > 0.
2. Update the upper bound U ← min {U, V 1 (z), . . . , V N (z)} and construct the piecewise lower bounding function Φ(x, z, Π) as in (8). 3. Update the lower bound 5. If U − Θ(x + , z + ) ≤ ε, terminate. Otherwise, update z ← z + and go to Step 1. and which contain much fewer integer variables than the orginal MICP (1). The theoretical results in Section 3.5 will elaborate further on the benefits of this alternation strategy. Moreover, in Section 4 a numerical case study is examined, which illustrates the practical advantages of Algorithm 2.

Relation to distributed local optimization methods
The idea to "augment" the local objective functions f i with a suitable function Ψ i is frequently used in the context of distributed local optimization algorithms. For example, in the context of dual decomposition [16,36], one augments the separable functions f i with linear functions of the form 4 where σ is the current dual iterate. Similarly, in the context of ADMM or ALADIN, one uses augmented Lagrangians [3,39], as in where z and σ are the current primal and dual iterates; see [9,15,24]. In fact, the construction of Algorithm 2 is inspired by the distributed local nonlinear programming method ALADIN. Here, we recall that ALADIN alternates between solving small-scale decoupled NLPs that are augmented by terms of the form (22) and large scale equality constrained quadratic programming problems that update σ and y [24]. This is in analogy to Algorithm 2, which alternates between solving decoupled MICPs (Step 1) and large-scale coupled MILPs (Step 3). However, unlike ALADIN, augmented Lagrangians are not used in Algorithm 2 as Lagrange multipliers in integer programming are not related to sensitivity and generally not applicable. Also note that the construction of the functions Ψ i in Algorithm 2 also has similarities with Gauss-Seidel or more general block-coordinate descent methods [48,51] in the sense that a partial decoupling is obtained by fixing some of the integer variables while others are optimized. However, despite all these analogies and similarities of Algorithm 2 with methods from the field of local and convex optimization, we would like to highlight that all these existing distributed optimization methods are not reliably applicable [46].

Convergence Analysis
In this section we provide a concise overview of the convergence properties of Algorithm 2. The following theorem establishes one of the main results of this paper, namely, that Algorithm 2 converges after one iteration if the integer iterate, z, is initialized with an optimal solution of (1). This is contrast to Algorithm 1, which does not necessarily terminate after a small number of steps-not even if it is initialized at an optimal solution.
Theorem 2 Let Assumption 1 be satisfied and let (x , z ) be a minimizer of (1). If Algorithm 2 is initialized with z = z and if the termination tolerances of the lower level solvers satisfy L ≤ , then the termination criterion in Step 4 is satisfied. In other words, the algorithm terminates after one step.
Proof See Appendix A.4.
Notice that the statement of the above theorem is of fundamental relevance and a very favorable property of PaDOA. If we work with other global optimization methods, say branch-and-bound, an empirical observation is that such existing global optimization algorithm often find a global solution early on but then keep on iterating until the lower bound is accurate enough to prove global optimality. In contrast to this, PaDOA terminates as soon as a global minimizer is added to the collection Π. In fact, Theorem 2 implies that global optimality of a point z ∈ Z can be verified by solving the N instances of the partially decoupled MICPs and the master MILP (21). Notice that this result is not in conflict with existing results from the field of complexity theory, because the master MILP (21) remains NP-hard [19,35].

Remark 5
The result of Theorem 2 relies heavily on the convexity of the functions f i on the convex hull of X i × Z i , although this fact is not highlighted explicitly in the proof. This convexity assumption is first of all required implicitly by our assumption that the lower level solvers return piecewise level models Θ k , which satisfy the termination condition (17) (this assumption is only reasonable if strong duality holds) and which need to be global lower bounds on f . These properties are in general all not satisfied if one considers more general non-convex MINLPs.
The following theorem establishes the fact that Algorithm 2 converges after a finite number of iterations under exactly the same conditions under which convergence of Algorithm 1 can be established. Proof See Appendix A.5.

Implementation and Case Study
The Partially Distributed Outer Approximation method is implemented in MATLAB R2017b. The optimization subproblems are solved using Gurobi [38] implemented via CasADi v1.9.0 [2]. All numerical experiments were run on a 2.9GHz Intel Core i5-4460S CPU with 8GB of RAM.

Problem Description
An important problem in the planning and operation of a heating and/or cooling system is the scheduling of so-called Thermostatically Controlled Loads (TCLs) [30]. These are devices that are used to regulate the temperature of a room/building within a certain user-defined interval known as a "deadband". The optimal operation strategy is especially difficult to determine when a non-constant cost function is introduced for a population of heterogeneous TCLs [52]. The cost function may represent the cost of electricity or userdiscomfort from noise generation. Regardless, such devices typically only have an "on" and an "off" setting and thus the resulting scheduling problem can be formulated as a binary MIP as seen in (23) for R regions with a finite time horizon H. The equations in (23) are based on the formulation given in [28], but with dynamics modeling the interaction between each region and a linear cost function instead of a quadratic. min T (·),u(·) subject to ∀i ∈ {1, . . . , R}, ∀t ∈ {0, . . . , H}, where c(t) is the vector of device costs at time t, γ is a comfort parameter,

Results for MILP
If the comfort parameter γ is taken to be zero then Problem (23) is linear and separable but coupled in both its discrete and real-valued variables. Shown in Tables 1 and 2 are the simulation results for each configuration, respectively. The results of Algorithm 2 are compared with results obtained from a Branch and Bound approach as implemented in Bonmin with default settings [6] as well as the commercial MIQP solvers Gurobi and CPLEX [25]. An example solution for the 3 room case is depicted in Figure 4.2.  At first glance, the results from Tables 1 and 2 may seem surprising since the 4 room case has more space to keep cool but nonetheless is able to do so at a lower cost than the 3 room case. This is due to an insulation effect that the 4 room configuration enjoys. With the activation of two coolers in the first six time steps, the room temperatures can stay within their deadbands for the  entire 48 hour period. In contrast, the 3 room configuration is more susceptible to the ambient temperature and requires more use of the coolers. This also seems to have increased the computational complexity of the problem and requires more time for the 3 room case to be solved than the 4 room case. It should be noted that several initializations were tested and the solution times were not significantly affected, implying that this was not the cause of the runtime differences in the two cases.
One of the advantages of using a distributed method is the ability to solve problems that would be otherwsie intractable for a centralized solver. Tables  1 and 2 show results for cases containing up to 496 variables, but even larger problems may be considered. Table 3 shows results for a variety of time horizons and rooms. Here, the room configuration is instead arranged such that the rooms are in a line. While unrealistic for most buildings, this setup is realistic for the temperature control of a train or rooms next to a corridor. Mathematically, this example differs somewhat from the other two. While the other problems have a significant amount of coupling between the control variables, the at is not the case here. This sparsity allows for Algorithm 2 to outperform both Gurobi and CPLEX (applied to the centralized problem).

Results for MIQP
If the comfort parameter γ is larger than zero then Problem (23) is a convex MIQP. 5 As in Section 4.2, The results of Algorithm 2 for each room configuration are compared with those obtained from Bonmin, Gurobi, and CPLEX. The value of γ was chosen to be one to allow for an equal weighting of comfort and cost.
Shown in Figure 4.3 are the trajectories obtained for the three-room scenario with a temperature deviation penalization. In contrast to Figure 4.2, a quadratic penalty term is used to model discomfort caused by deviations from the set temperature. Indeed, the solution with γ = 1 yields a trajectory with a similar number of activations as when γ = 0 but with temperature trajectories that stay much closer to the middle of the deadband. Theoretically, the quadratic term should make the problem more computationally difficult, but in some cases both Gurobi and CPLEX actually require less time. In contrast, Algorithm 2 requires many more iterations than the MILP formulation. Future work could seek to use some of the cutting plane methods and other heuristics used by Gurobi and CPLEX to alleviate this issue. Furthermore, quadratic lower bounding functions could significantly reduce the number of iterations until convergence.

Higher Order Convex Problems
One of the advantages of the proposed algorithm is that it is applicable to a relatively large class of problems (namely, MICPs). While Section 4.3 shows favourable results for both Gurobi and CPLEX, if the problem were adjusted slightly such that it were no longer an MIQP then these solvers would no longer be applicable. For example, if the objective function of Problem (23) became min T (·),u(·) then this would still be solvable via PaDOA, but not Gurobi or CPLEX. However, Bonmin can still be applied. 6 The results for a variety of such problem configurations are shown below in Figure 7. Therein it can be observed that Algorithm 2 returns the same, global solution as Bonmin 7 , and does so in less time. The runtime difference is particularly striking for the 7 room scenarios as these contain the most variables and have the greatest potential for parallelization.

Outlook
The implementation of Algorithm 2 used in Section 4 was protypical and could be improved in a number of ways. Interestingly, as Table 8 shows, the majority of the time spent by the algorithm was typically on the coupling problems. However, more time was required for the solution of the MICP subproblems in the higher order version of the problem. An example of which is shown in Table 9, along with the relevant convergence plot. The reason why the linear approximations required so much time to be solved likely lies in the heuristics and presolving processes used by Gurobi. This is despite the fact that Gurobi was used to solve both the linear approximations and the full nonlinear program.

Conclusions
This paper has introduced the partially distributed outer approximation method PaDOA (Algorithm 2) for finding -suboptimal points of the struc-  tured MICP (1). PaDOA proceeds by alternating between solving partially decoupled MICPs that comprise n integer variables and large scale MILPs with nN variables. Finite termination conditions for PaDOA have been established in Theorem 3. Moreover, we have discussed the major theoretical and practical advantages of PaDOA compared to exising extended formulation based OA solvers. In particular, Theorem 2 states that PaDOA terminates after the first iteration, if it is initialized at a global minimizer-an important property that is neither shared by existing OA nor by existing branch-and-bound based methods for MICP.
In Section 4, first, second and fourth order mixed integer problems were used to demonstrate the practical performance of PaDOA compared to other state of the art solvers by application to a scheduling problem of thermostatically controlled loads. While the solution and runtime are competitive for each of the case studies considered, the best performance was observed for problems with sparse Hessians and coupling constraints. Furthermore, it was observed that PaDOA was able to return a solution in several cases where the centralized approach could not due to memory constraints.
Future work will investigate the use of piecewise linear underapproximations and/or cutting planes in the MILP step to extend PaDOA to non-convex MINLPs. Furthermore, Step 4 requires full constraint information in order to return a feasible solution. This restricts the applicability in terms of fully distributed settings and future work will focus on sidestepping this restriction.

A.3 Proof of Theorem 1
Notice that if the equation Ax = b has no solution in X, this will be detected immediately by Step 2 of Algorithm 1, which causes termination. Thus, we may assume that all optimization problems are feasible. Now, the main idea of the proof is to show that the cardinality of the set Π is strictly increasing in every iteration, if the algorithm does not terminate. For this aim, we first notice that any solution (x + , y + , z + ) of the MILP (12) satisfies the equation by construction. Moreover, because we have Ax + = b, the inequality min x,Ax=b Φ(x, z + , Π) ≤ Φ(x + , z + , Π) holds. If we further assume that the termination criterion is not satisfied, we must have Thus, if we had z + ∈ Ξ, then the result of Lemma 1 would imply that f (z + ) (10) = min x,Ax=b Φ(x, z + , Π) as well as U ≤ f (z + ), since z + has already been added to the collection Π. By substituting all the above relations we would then find that f (z + ) (28), (26) ≤ Φ(x + , z + , Π) (25), (27) < which is a contraction. Thus, either our assumption that the algorithm does not terminate or our assumption z + ∈ Ξ must be wrong. In other words, if the algorithm does not terminate in the current step, then the cardinality of the set Π increases by 1 in the next step, because z + is added to the collection Π ⊆ Z. But this is only possible for a finite number of steps, because the set Z contains only a finite number of points. Thus, Algorithm 1 must terminate after a finite number of iterations.

A.4 Proof of Theorem 2
Let V = N i=1 f i (x i , z i ) denote the optimal value of (1). Because we assume that such an optimal solution exists while Assumption 1 is satisfied, the partially decoupled optimization problems are all feasible and return piecewise affine lower bounds that satisfy the termination condition (17) with V k (z ) = V , i.e., we have for all k ∈ {1, . . . , N }. Because the function Θ is by construction an upper bound on Θ k (for any k), we further have min x∈X,ζ∈Z k ,Ax=b Θ k (x, ζ) ≤ min x∈X,z∈Z,Ax=b Θ(x, z) = Θ(x + , z + ) , where (x + , z + ) denotes the solution of the master MILP (21). By substituting the above inequalities we find that V − L ≤ Θ(x + , z + ) . Because we assume that L ≤ , this implies that Thus, the termination condition is satisfied and Algorithm 2 terminates after the first step.

A.5 Proof of Theorem 3
We may assume that the coupled equality constraint is feasible, as infeasibility would be detected immediately in Step 1 of Algorithm 2. Similar to the proof of Theorem 1, we need to keep track of the integer solutions of the master MILPs. For this aim, we introduce the following "artificial" additional step: Step 3 ): After solving (21), updateΠ =Π ∪ {z + }.
If the setΠ is initialized with the empty set and if Step 3 is inserted in Algorithm 2 immediately after Step 3, the iterates of this algorithm remain unaffected. The main idea of the proof is now to show that the cardinality of the setΠ increases in every iteration of Algorithm 1 under the assumption that the termination criterion is not satisfied. Let us assume that the solution z + satisfies z + ∈Π (beforeΠ is updated in Step 3 ). Then we have U ≤ V k (z + ) (17) ≤ L + min x∈X,ζ∈Z k ,Ax=b Because Θ is an upper bound on Θ k , this implies that we also have min x∈X,ζ∈Z k ,Ax=b Θ k (x, ζ) ≤ min x∈X,z∈Z,Ax=b Θ (x, ζ) = Φ(x + , z + ) , which yields U −Φ(x + , z + ) ≤ L ≤ . Thus, either the termination criterion is satisfied or we have z + / ∈Π. In the latter case, the cardinality of Π increases by 1 in the current iteration. As this is only possible for a finite number of steps, Algorithm 2 must terminate.