Feasibility vs. Optimality in Distributed AC OPF - A Case Study Considering ADMM and ALADIN

This paper investigates the role of feasible initial guesses and large consensus-violation penalization in distributed optimization for Optimal Power Flow (OPF) problems. Specifically, we discuss the behavior of the Alternating Direction of Multipliers Method (ADMM). We show that in case of large consensus-violation penalization ADMM might exhibit slow progress. We support this observation by an analysis of the algorithmic properties of ADMM. Furthermore, we illustrate our findings considering the IEEE 57 bus system and we draw upon a comparison of ADMM and the Augmented Lagrangian Alternating Direction Inexact Newton (ALADIN) method.


Introduction
Distributed optimization algorithms for ac Optimal Power Flow (opf) recently gained significant interest as these problems are inherently non-convex and often large-scale; i.e. comprising up to several thousand buses [6]. Distributed optimization is considered to be helpful as it allows splitting large opf problems into several smaller subproblems; thus reducing complexity and avoiding the exchange of full grid models between subsystems. We refer to [11] for a recent overview of distributed optimization and control approaches in power systems.
One frequently discussed convex distributed optimization method is the Alternating Direction of Multipliers Method (admm) [2], which is also applied in context of ac opf [6,5,10]. admm often yields promising results even for large-scale power systems [5]. However, admm sometimes requires a specific partitioning technique and/or a feasible initialization in combination with high consensus-violation penalization parameters to converge [6]. The requirement of feasible initialization seems quite limiting as it requires solving a centralized inequality-constrained power flow problem requiring full topology and load information leading to approximately the same complexity as full opf.
In previous works [3,4,12] we suggested applying the Augmented Lagrangian Alternating Direction Inexact Newton (aladin) method to stochastic and deterministic opf problems ranging from 5 to 300 buses. In case a certain line-search is applied [8], aladin provides global convergence guarantees to local minimizers for non-convex problems without the need of feasible initialization. The results in [3] underpin that aladin is able to outperform admm in many cases. This comes at cost of a higher per-step information exchange compared with admm and a more complicated coordination step, cf. [3].
In this paper we investigate the interplay of feasible initialization with high penalization for consensus violation in admm for distributed ac opf. We illustrate our findings on the ieee 57-bus system. Furthermore, we compare the convergence behavior of admm to aladin not suffering from the practical need for feasible initialization [8]. Finally, we provide theoretical results supporting our numerical observations for the convergence behavior of admm.
The paper is organized as follows: In Section 2 we briefly recap admm and aladin including their convergence properties. Section 3 shows numerical results for the ieee 57-bus system focusing on the influence of the penalization parameter ρ on the convergence behavior of admm. Section 4 presents an analysis of the interplay between high penalization and a feasible initialization.

ALADIN and ADMM
For distributed optimization, opf problems are often formulated in affinely coupled separable form where the decision vector is divided into sub-vectors x = [x 1 , . . . , x |R| ] ∈ R n x , R is the index set of subsystems usually representing geographical areas of a power system and local nonlinear constraint sets Througout this work we assume that f i and h i are twice continuously differentialble and that all X i are compact. Note that the objective functions f i : R n xi → R and nonlinear inequality constraints h i : R n xi → R n hi only depend on x i ∈ R n i and that coupling between them takes place in the affine consensus constraint i∈R A i x i = 0 only. There are several ways of formulating opf problems in form of (1) differing in the coupling variables and the type of the power flow equations (polar or rectangular), cf. [3,5,9].
Here, we are interested in solving Problem (1) via admm and aladin summarized in Algorithm 1 and Algorithm 2 receptively. 1,2 At first glance it is apparent that admm and aladin share several features. For example, in Step 1) of

Parallelizable
Step: Solve for all i ∈ R locally 2. Update dual variables λ k+1 3. Consensus Step: Solve the coordination problem min ∆x i∈R

Parallelizable
Step: Solve for all i ∈ R locally

2.
Compute sensitivities: Compute Hessian approximations B k i , gradients g k i and Jacobians of the active constraints C k i , cf. [8].
both algorithms, local augmented Lagrangians subject to local nonlinear inequality constraints h i are minimized in parallel. 3 Observe that while admm maintains multiple local Lagrange multipliers λ i , aladin considers one global Lagrange multiplier vector λ. In Step 2), aladin computes sensitivities B i , g i and C i (which often can directly be obtained from the local numerical solver without additional computation) and admm updates the multiplier vectors λ i . In Step 3), both algorithms communicate certain information to a central entity which then solves a (usually centralized) coordination quadratic program. How-ever, aladin and admm differ in the amount of exchanged information: Whereas admm only communicates the the local primal and dual variables x i and λ i , aladin additionally communicates sensitivities. This is a considerable amount of extra information compared with admm. However, there exist methods to reduce the amount of exchanged information and bounds on the information exchange are given in [3]. Another important difference is the computational complexity of the coordination step. In many cases, the coordination step in admm can be reduced to an averaging step based on neighborhood communication only [2], whereas in aladin the coordination step involves the centralized solution of an equality constrained quadratic program.
In the last step, admm updates the primal variables z i , while aladin additionally updates the dual variables λ. Differences of aladin and admm also show up in the convergence speed and their theoretical convergence guarantees: Whereas aladin guarantees global convergence and quadratic local convergence for nonconvex problems if a certain line-search is applied [8], few results exist for admm in the non-convex setting. Recent works [14,7] investigate the convergence of admm for special classes of non-convex problems; however, to the best of our knowledge opf problems do not belong to these classes.

Numerical Results
Next, we investigate the behavior of admm for large ρ and a feasible initialization to illustrate performance differences between admm and aladin. We consider power flow equations in polar form with coupling in active/reactive power and voltage angle and magnitude at the boundary between two neighbored regions [3]. We consider the ieee 57-bus system with data from matpower and partitioning as in [3] as numerical test case. Figures 1 and 2 show the convergence behavior of admm (with and without feasible initialization (f.)) and aladin for several convergence indicators and two different penalty parameters ρ = 10 4 and ρ = 10 6 . 4 Therein, the left-handed plot depicts the consensus gap Ax k ∞ representing the maximum mismatch of coupling variables (active/reactive power and voltage magnitude/angle) at borders between two neighbored regions. The second plot shows the objective function value f k and the third plot presents the distance to the minimizer x k − x ∞ over the iteration index k. The right-handed figure shows the nonlinear constraint violation g(z k ) ∞ after the consensus steps (4) and (5) of Algorithm 1 and 2 respectively representing the maximum violation of the power flow equations. 5 In case of small ρ = 10 4 , both admm variants behave similar and converge slowly towards the optimal solution with slow decrease in consensus violation, nonlinear constraint violation and objective function value. If we increase ρ to ρ = 10 6 with results shown in Figure 2, the consensus violation Ax k ∞ gets smaller in admm with feasible initialization. The reason is that a large ρ forces x k i being close to z k i leading to small Ax k ∞ as we have Az k = 0 from the consensus step. But, at the same time, this also leads to a slower progress in optimality f k compared to ρ = 10 4 , cf. the second plot in Figures 1 and 2.
On the other hand, this statement does not hold for admm with infeasible initialization (blue lines in Figures 1 and 2) as the constraints in the local step and the consensus step of admm enforce an alternating projection between the consensus constraint and the local nonlinear constraints. The progress in the nonlinear constraint violation g(z k ) ∞ supports this statement. In its extreme, this behavior can be observed when using ρ = 10 12 depicted in Figure 3. There, admm with feasible initialization produces very small consensus violations and nonlinear constraint violations at cost of almost no progress in terms of optimality.
Here the crucial observation is that in case of feasible initialization and large penalization parameter ρ admm produces almost feasible iterates at cost of slow progress in the objective function values. From this, one is tempted to conclude that also for infeasible initializations admm will likely converge, cf. Figure 1-3. This conclusion is supported by the rather small 57-bus test system. However, it deserves to be noted that this conclusion is in general not valid, cf. [6].
For aladin, we use ρ = 10 6 and µ = 10 7 . Comparing the results of admm with aladin, aladin shows superior quadratic convergence also in case of infeasible initialization. This is inline with the known convergence properties of aladin [8]. However, the fast convergence comes at the cost of increased communication overhead per step, cf. [3] for a more detailed discussion. Note that aladin involves a more complex coordination step, which is not straightforward to solve via neighborhood communication. Furthermore, tuning of ρ and µ can be difficult.
In power systems, usually feasibility is preferred over optimality to ensure a stable system operation. Hence, admm with feasible initialization and large ρ can in principle be used for opf as in this case violation of power flow equations and generator bounds are expected to be small and one could at least expect certain progress towards an optimal solution. Following this reasoning several papers con- sider A(x k − z k ) ∞ < as termination criterion [5,6]. However, as shown in the example above, if ρ is large enough, and admm is initialized at a feasible point, this termination criterion can always be satisfied in just one iteration if ρ is chosen sufficiently large. Consequently, it is unclear how to ensure a certain degree of optimality. An additional question with respect to admm is how to obtain a feasible initialization. To compute such an initial guess, one has to solve a constrained nonlinear least squares problem solving the power flow equations subject to box constraints. This is itself a problem of almost the same complexity as the full opf problem. Hence one would again require a computationally powerful central entity with full topology and parameter information. Arguably this jeopardizes the initial motivation for using distributed optimization methods.

Analysis of ADMM with feasible initialization for ρ → ∞
The results above indicate that large penalization parameters ρ in combination with feasible initialization might lead to pre-mature convergence to a suboptimal solution. Arguably, this behavior of admm might be straight-forward to see from an optimization perspective. However, to the best of our knowledge a mathematically rigorous analysis, which is very relevant for opf, is not available in the literature.
Proposition 1 (Feasibility and ρ → ∞ imply x k+1 i − x k i ∈ null(A i )) Consider the application of admm (Algorithm 1) to Problem (1). Suppose that, for all k ∈ N, the local problems (2) have unique regular minimizers x k i . 6 Fork ∈ N, let λk i be bounded and, for all i ∈ R, zk i ∈ X i . Then, the admm iterates satisfy Proof The proof is divided into four steps. Steps 1)-3) establish technical properties used to derive the above assertion in Step 4).
Step 1). At iterationk the local steps of admm are xk i (ρ) = argmin Now, by assumption all f i is twice continuously differentiable (hence bounded on X i ), λk i is bounded and all zk i ∈ X i . Thus, for all i ∈ R, lim Step 2). The first-order stationarity condition of (6) can be written as where γk i is the multiplier associated to h i . Multiplying the multiplier update formula (3) with A i from the left we obtain A i λ k+1 By differentiability of f i and h i , compactness of X i and regularity of xk i this implies boundedness of A i λk +1 i .
Step 3). Next, we show by contradiction that ∆xk i ∈ null(A i ) for all i ∈ R and ρ → ∞. Recall the coordination step (4b) in admm given by min ∆x i∈R Observe that any ∆xk i ∈ null(A i ) is a feasible point to (8) as i∈R A i xk i = 0.
Consider a feasible candidate solution ∆x i / ∈ null(A i ) for which i∈R A i (xk i + ∆x i ) = 0. Clearly, λk +1 i A i ∆x i (ρ) will be bounded. Hence for a sufficiently large value of ρ, the objective of (8) will be positive. However, for any ∆x i ∈ null(A i ) the objective of (8) is zero, which contradicts optimality of the candidate solution ∆x i / ∈ null(A i ). Hence, choosing ρ sufficiently large ensures that any minimizer of (8) lies in null(A i ).
Step 4). It remains to show xk +1 i = xk i . In the last step of admm we have zk +1 = xk + ∆xk. Given Steps 1-3) this yields zk +1 = zk + vk + ∆xk and hence Observe that this implies that, for ρ → ∞, problem (6) does not change from step k tok + 1. This proves the assertion.
Corollary 1 (Deterministic code, feasibility, ρ → ∞ implies x k+1 i = x k i ) Assuming that the local subproblems in admm are solved deterministically; i.e. same problem data yields the same solution. Then under the conditions of Proposition 1 and for ρ → ∞, once admm generates a feasible point xk i to Problem (1), or whenever it is initialized at a feasible point, it will stay at this point for all subsequent k >k.
The above corollary explains the behavior of admm for large ρ in combination with feasible initialization often used in power systems [6,5]. Despite feasible iterates are desirable from a power systems point of view, the findings above imply that high values of ρ limit progress in terms of minimizing the objective.
Remark 1 (Behavior of aladin for ρ → ∞) Note that for ρ → ∞, aladin behaves different than admm. While the local problems in aladin behave similar to admm, the coordination step in aladin is equivalent to a sequential quadratic programming step. This helps avoiding premature convergence and it ensures decrease of f in the coordination step [8].

Conclusions
This method-oriented work investigated the interplay of penalization of consensus violation and feasible initialization in admm. We found that-despite often working reasonably with a good choice of ρ and infeasible initialization-in case of feasible initialization combined with large values of ρ admm typically stays feasible yet it may stall at a suboptimal solution. We provided analytical results supporting this observation. However, computing a feasible initialization is itself a problem of almost the same complexity as the full opf problem; in some sense partially jeopardizing the advantages of distributed optimization methods. Thus distributed methods providing rigorous convergence guarantees while allowing for infeasible initialization are of interest. One such alternative method is aladin [8] exhibiting convergence properties at cost of an enlarged communication overhead and a more complex coordination step [3].