Penalty alternating direction methods for mixed-integer optimal control with combinatorial constraints

We consider mixed-integer optimal control problems with combinatorial constraints that couple over time such as minimum dwell times. We analyze a lifting and decomposition approach into a mixed-integer optimal control problem without combinatorial constraints and a mixed-integer problem for the combinatorial constraints in the control space. Both problems can be solved very efficiently with existing methods such as outer convexification with sum-up-rounding strategies and mixed-integer linear programming techniques. The coupling is handled using a penalty-approach. We provide an exactness result for the penalty which yields a solution approach that convergences to partial minima. We compare the quality of these dedicated points with those of other heuristics amongst an academic example and also for the optimization of electric transmission lines with switching of the network topology for flow reallocation in order to satisfy demands.


Introduction
Optimal control problems subject to integer restrictions on some part of the controls have recently received a lot of attention in the literature. This problem class is a convenient way to model, for instance, autonomous driving in case of vehicles with gear shift power units [19], contact problems such as robotic multiarm transport [4], or the operation of networked infrastructure systems such as gas pipelines [13], water canals [14], traffic roads [9], and power transmission lines [8] with switching of valves, gates, traffic lights and interconnectors, respectively. Often, the integer controls are additionally constrained in order to prevent certain switching configurations, to limit the number of switches or to enforce certain dwell or dead times after a switch. In this paper, we consider such combinatorial constraints with a focus on conditions which cannot be imposed pointwise and hence couple over time.
A discretization of such problems naturally leads to mixed-integer non-linear programs that often become computationally intractable when the discretization stepsizes tend to zero. Moreover, when passing to the limit, one may face convergence issues [16]. A computationally much more efficient alternative solution approach is based on decomposition techniques, splitting the problem into a continuous subproblem by partial outer convexification with relaxation of binary multipliers (POC) combined with a combinatorial integral approximation problem (CIAP) [15,17,18,21,26,28,29]. However, in the presence of combinatorial constraints that couple over time, this approach only yields feasible solutions with a priori lower bounds, but yet without any characterization of optimality in some reasonable sense on the mixed-integer level.
We consider here another approach based on the idea of alternating direction methods (ADM). This will provide feasible solutions which can be characterized as partially optimal in a lifted sense. The approach uses POC and CIAP in one direction and a mixed-integer linear problem (differing from the combinatorial integral approximation problem) in another direction. Both directions are weakly coupled using a penalty term with adapting an idea outlined in [7]. Based on exactness of this penalty, we provide a convergence result of this method. Our analysis applies to problems in the setting of abstract semilinear evolutions subject to control constraints. However, the methods can be extended to state constraints. In particular, the techniques apply to optimal control problems with ordinary differential equations. The method can be also seen as an adaptation of classical feasibility pump algorithms (for an overview, see [3]) with heavy structure exploitation for mixed-integer optimal control problems.
In a numerical study, we consider two problems from the mintOC.de library [27], which we augment by minimum dwell-time constraints. We compare the proposed approach with direct discretization and mixed-integer programming techniques in order to address local vs. global optimality and to the decomposition with POC and CIAP as a heuristic.
We note that continuous reformulations of such problems with switching time and mode insertion optimization can also be seen as an alternating direction method [24,25], but that the approach proposed here is different.
The article is organized as follows. In Section 2, we present the problem formulation. In Section 3, we extend the framework of alternating direction methods to partial ε-optimality. In Section 4, we apply the ε-ADM framework together with penalty techniques to mixed-integer optimal control problems as the main algorithm and develop the convergence theory. In Section 5, we provide our numerical results. In Section 6, we give concluding remarks.

Problem formulation
We consider a mixed-integer optimal control problem of the form where for some M ∈ N and p ∈ (0, ∞], Y and U are Banach spaces, representing state costs, Σ is a subset of U t representing constraints on the continuous control u, and Γ is a subset of V t representing combinatorial constraints (e. g., dwell-time constraints and switching order constraints). We say that the set of combinatorial constraints Γ has a uniform finiteness property, if there exists a constant n s ∈ N such that v ∈ Γ implies that v is piecewise constant with at most n s switching points.

Example 1 (Combinatorial constraints).
With v = (v 1 , . . . , v M ) being the componentwise respresentation of v ∈ V t and the total variation of the ith component on the interval (t 1 , t 2 ) ⊂ (0, T ) being denoted by we can for example enforce a minimal dwell-time τ min with the constraint or directly limit the total number of switches for the ith component to n max In both cases it is easy to see that any set Γ ⊂ V t containing one of these constraints has the uniform finiteness property. Further additional constraints are of course possible, for instance, a maximum dwell-time τ max for a subset of components Constraints of the form (2)-(4) or variants of it are significant in many applications that involve switching control, but they are typically extremely difficult to be treated in the context of mixed-integer optimal control because they are not defined pointwise in time.
Concerning the wellposedness of the problem, we make the following assumptions.

Assumption 1.
Suppose that A generates a strongly continuous semigroup e tA on Y , and that there exists a constant K > 0 such that, for all i = 1, We note that the conditions of Assumption 1 are sufficient for the state equation (1b) to admit a unique solution y(·; u, v) in C([0, T ]; Y ) given by Of course, other conditions are also possible, see e.g., [22]. Moreover, the uniform finiteness property is crucial for the existence of optimal solutions for the problem (1). For additional problem specific assumptions, appropriate wellposedness results can for example be obtained via parametric programming. The next theorem illustrates this for the case of generators of immediately compact semigroups.

Theorem 1.
Suppose that Assumption 1 holds. Moreover, assume that X is separable and reflexive, that f (y, Σ, v i ) is closed and convex in Y , e tA is compact for t > 0 and that Γ has the uniform finiteness property. Then the problem (1) has an optimal solution (y * , u * , v * ).
Proof. Under the uniform finiteness property, the problem (1) can be considered as a parametric two stage problem, where the inner problem consists of minimizing with respect to u and the outer problem is a minimization with respect to finitely many switching times τ k . Under the given assumptions, the inner problem has an optimal solution and the optimal value depends continuously on the initial data [5], and hence via (5) on the switching times τ k ∈ [0, T ]. The claim then follows from the extreme value theorem of Weierstrass.
For the solution approach considered below, we note that under the Assumption 1, we can consider the reduced problem and results for (6) can be carried over to the original problem (1) via (5).

ADM with ε-optimality
As a solution approach we extend here the idea of ADM. Suppose we were to minimize a nonlinear function f (u, v) over u ∈ U and v ∈ V subject to constraints (u, v) ∈ Ω for some given feasible set Ω. Further suppose that we can compute ε-optimal solutions for each of the partials u (with v fixed) and v (with u fixed). Then, given some ε ≥ 0 and some guess (u 0 , v 0 ), we can consider the following sequential approach to compute a solution candidate (u * , v * ): . v) Set l = l + 1 and continue with step i). This algorithm may not terminate. For classical ADM, there are well-known conditions under which we can ensure that the algorithm does not cycle, i.e. that the algorithm does not get stuck in a loop of different solutions; for a discussion, see [7]. However, if it terminates, we can conclude that (u * , v * ) satisfies This can be seen as follows: If the algorithms terminates in step ii), we have for somel from ii) Moreover, from step iii) with l =l − 1, we have If the algorithm terminates in step iv), we have for somel from iv) Hence, with ul +1 = u * , we get Moreover, from i), we have We shall call points (u * , v * ) satisfying (7a) and (7b) p-ε-optimal as a shorthand for partially ε-optimal.

ADM and p-minima for mixed-integer optimal control problems
Concerning the mixed-integer optimal control problem (1) or equivalently for the reduced form (6), a natural ADM splitting is using the directions u ∈ U and v ∈ V . However, in the direction of v, this still results in a mixed-integer nonlinear optimization problem subject to a differential equation. To avoid this, we will instead use that (1) and consider a splitting with respect to the directions (u, v) andṽ. This particular splitting is chosen deliberately in view of the fact that the two subproblems can be efficiently solved to ε-optimallity with existing techniques. Motivated by (7a) and (7b), we say that a point (u (8). Consistently, we say that a point (y * , u * , v * ) is p-ε-minimal for the original problem (1) if (u * , v * ) is a p-ε-minimum of (6) and y * is a solution of the state equation (1b) with u = u * and v = v * . We note that p-ε-minima are not necessarily global ε-minima. But any global minimum of (1) is p-ε-minimal with ε = 0. For brevity, we call p-ε-minima with ε = 0 just p-minima. The above discussion motivates to compute p-minima of good quality. To this end, we enforce the coupling of v andṽ in (8c) weakly with a suitable penalty term. The penalty parameter can then eventually be used to avoid getting stuck in p-ε-minima with too high objective. This idea was introduced recently in [7] for classical ADM in the context of feasibility pumps for MINLPs. Suitably adapted to our setting here, we are going to show an exactness result for the penalty problem.
We may consider the optimal value function of the reduced problem (6) partially as a function η : V t → R ∪ {−∞, +∞}. We will impose the following technical assumption on η.

Assumption 2.
Given an optimal solution (y * , u * , v * ) of (1), or equivalently an optimal solution (u * , v * ) of problem (6) the value function η defined in (9) is locally Lipschitz continuous in the sense that for all δ > 0 there exists a constant L such that Assumption 2 is typically satisfied if the optimal solution (y * , u * , v * ) satisfies a constraint qualification. For instance for mixed-integer linear quadratic optimal control problems the Lipschitz continuity of the optimal value function under a constraint qualification of a Slater-type is discussed in [10]. Now we consider the following auxiliary problem with a penalty parameter ρ ≥ 0. With (5) and (6a) we can reduce (11) to The following result shows the exactness of the penalty term in (11) and relates global minima of (6) to p-minima of (12). Proof. Let (u * , v * ) be an optimal solution of (6). We note that by construction We have to show that ([u * , v * ],ṽ * ) satisfies with ε = 0. It can be seen that condition (13a) holds with ε = 0 for all ρ ≥ 0. This follows directly from the definition of Ψ ρ .
To show condition (13b) with ε = 0, we assume we are given (u, v) and consider the triple (u, v, v * ). Without loss of generality, we may assume that u is chosen optimally in the sense of (9). By the definition of Ψ ρ , condition (13b) with ε = 0 is equivalent to We directly observe that if Ψ(u, v) ≥ Ψ(u * , v * ) holds, then claim (14) is true for all ρ ≥ 0. This is, for instance, the case if v = v * holds, i.e. if (u, v, v * ) is feasible for (8), because (u * , v * ) is a global minimum of (8). Hence, we only need to consider the case in which Ψ(u, v) ≤ Ψ(u * , v * ) and v = v * hold.
As we have chosen u optimally and because u * is also an optimal choice for v * in the sense of (9), we can rewrite (14) further to obtain Now, we may use Assumption 2 to obtain This shows that condition (15) is fulfilled for all ρ ≥ L. Hence, we have shown that condition (13b) holds with ε = 0 if we setρ = L.
The essential idea of the proposed method now is to solve (12) ε 2 -optimally alternately once with a fixedṽ and once with a fixed (u, v), respectively, for a fixed penalty parameter ρ until stagnation and then to repeat the process with an increased ρ. The algorithm is summarized in Algorithm 1.
Concerning the convergence of Algorithm 1, we can now make the following statements.

Theorem 3. Let ρ k ր ∞ and let
Proof. Let (u, v,ṽ k ) be feasible for (12). Then Letρ be a cluster point of the sequence ρ k |ρ k | k and (ρ l ) l be a subsequence for which ρ k |ρ k | k converges toρ. Then, dividing the inequality (16) by |ρ l | yields Taking the limit l → ∞ yields An analog inequality holds for any feasible (u k , v k ,ṽ).
Note that in the inner loop of Algorithm 1, we compute p-ε-minima, but that Corollary 1 says that a feasible limit of a converging sequence generated by ADM-SUR is a p-ε-minimum with ε = 0. Moreover, note that the two subproblems for (12) in Algorithm 1 can be solved efficiently.
For any fixedṽ the problem (12) is equivalent to the problem min y,z,u,w Φ(y(T )) + z(T ) (17a)  (17) with the relaxation w ∈ L ∞ (0, T ; [0, 1]M ) and w n ∈ V t be a sequence generated by the sumup rounding algorithm of [26,28] and y n , z n be the corresponding solutions of (17b) and (17c) with w = w n , then under Assumption 1 see [21] for details. Under additional assumptions on A and f , even error estimates are available [12,15,28]. In particular, (18) yields that this solution approach yields an ǫ 2 -optimal solution for a sufficiently fine control grid. We refer to this solution approach for subproblem (17) as the (POC)-step.
Further, for any fixed (u, v) the problem (12) reduces to Here, standard quadrature rules and mixed-integer linear programming techniques can be used to compute an ε 2 -optimal solution again for a sufficiently fine control grid. We refer to this solution approach for the subproblem (19) as the (MIP) step.
The sum-up rounding algorithm in the (POC)-step can be interpreted as the solution of the following combinatorial integral approximation problem (CIAP) for a piecewise constant function w n on a fixed grid, see [29]. We therefore refer to the penalty-ε method in Algorithm 1 as ADM-SUR. An interesting variant of this Algorithm is to skip SUR in the POC-step, i.e., doing the step in the relaxed direction of w and using the MIP-step to recover integer feasibility. We refer to this variant as ADM (without SUR). For comparison, we also consider the heuristic to apply POC to the original problem formulation without combinatorial constraints and to recover a feasible solution via the following mixed-integer problem again on a fixed grid for w n , see [15,29]. We refer to this approach as CIAP. Note that in contrast to (19), the cost function in (21) is not a norm on V t . A convergence result for this subproblem in an ADM framework such as for the ADM-SUR algorithm is therefore an open problem.

Numerical study
We test the proposed methods on two benchmark examples from the mintOC.de library [27], which we augment by minimum dwell-time constraints in order to prohibit infinitely many switching events. To model the dwell time condition we used the basic MIP constraints. These do not, however, form a complete description of the so-called min up/down polyhedron. One can use either use a complete description in the original variable space [20], where additional constraints are separated with cutting planes or use an extended formulation [23] instead. As the main focus of this article is the solution quality, we stick to the basic formulation above.
Our computations are based on CasADi [1] for the model equations and their derivatives and the solvers Ipopt [30] for nonlinear programming problems and Gurobi [11] for quadratic and linear mixed-integer programs.

Fuller's problem.
For our numerical study, we consider a variant of Fuller's problem augmented with minimum dwell-time constraints The problem is notoriously difficult, because the solution of the problem without dwell-time constraints (i.e., for τ min = 0) exhibits chattering [31].   Table 2. Single CPU runtimes in seconds on an Intel(R) Core(TM) i7-5820K CPU @ 3.30GHz for the corresponding results in Table 1. The remaining MIP gap achieved by Gurobi at a timeout of 1 hour is given in parantheses. Even though the codes for the heuristics CIAP, ADM, and ADM-SUR have not been heavily optimized, the runtimes are much smaller than for the MIQP solver.
We compare our proposed ADM-based method (with and without SUR) with a direct global Mixed-Integer Quadratic Programming (MIQP) method and CIAP. The resulting objective values and corresponding single CPU runtimes are depicted in Tables 1 and 2. It appears that the ADM-based methods have some advantage both in quality and runtime over CIAP for the instances with larger τ min , which are harder for POC-based heuristics (but appear to be simpler for the MIQP approach). Exemplary for two selected values dwell-times τ min , the resulting state y 1 in problem 22 is shown in Figure 1.

Network of transmission lines.
This problem was described in [8]. The telegraph equations are based on a 2 × 2 hyperbolic system of partial differential equations and describe the voltage and current on electrical transmission lines in  Figure 2. Network topology for the subgrid scenario of the transmission lines example. The objective is to continuously control the power generation at the producers via u 1 (t) and u 2 (t) and to switch on or off the dashed connections (via v 1 (t)) and the dotted connection (via v 2 (t)) in order to minimize the quadratic deviation of the power supply from the power demand at the five consumer nodes.
where Λ is a diagonal matrix including the speed of propagation in each direction and B denotes a symmetric matrix with non-negative entries. The dynamics on the lines are coupled at nodes via the boundary condition The distribution matrices D ± (v) depend on binary-valued controls v(t) ∈ {0, 1}, which are used to switch off specified connections in the network while the continuous-valued controls u(t) denote the power generation at the producer nodes in the network, cf. Figure 2. The goal is to minimize the quadratic deviation of the accumulated power delivery C s (t, ξ) = r∈δS ξ + r (l r , t) (with δ S being the set of all lines adjacent to node s) from the demand Q s (t) at the consumer nodes V S , i.e., s.t. (23) and (24).
This problem can be written in abstract form aṡ with A and B(v i ), i = 1, . . . , M being unbounded linear operators on Hilbert spaces using abstract semigroup theory [2]. Though (26) is not of the form (1)  is still given by the variation of constants formula (5) with f (y, u, v) = B(v)u, see e.g., [6].
For the computational experiments, we use the publicly available 1 Python implementation. The minimum dwell-time constraints are set to τ min = 1.
The Figures 3-5 illustrate the results for a scenario, in which a small subgrid of the network can be islanded, see Figure 2. We observe that the binary decisions can be partly equalized by reactions in the power generation at the producer nodes.  Figure 3 and Figure 4. Due to the additional minimum dwell-time constraints, the deviation of power delivery from the demand at consumer nodes is raised in comparison to the lower bound given by POC. The ADM-based heuristical results are superior to both SUR (which does not satisfy the minimum dwell-time constraint) and CIAP.

Conclusion
We conclude that the proposed penalty-ADM method performs notably well for our benchmark problems within the class of mixed-integer optimal control problems with dwell-time constraints. The quality of the computed solutions outperforms the other considered heuristic solutions for large dwell-times. Moreover, the convergence theory shows that the proposed method computes partial minima in a lifted sense. The comparison with a global solution for a full discretization shows that these partial minima are in general not global minima. However, we note that this is not surprising because we used a local solver for the POC-step. The proposed methods can be extended in various directions such as considerations of state constraints, mixed-integer corrector steps from linearizations and of course more general problem classes.