Keywords

1 Introduction

Distributed optimization algorithms for ac Optimal Power Flow (opf) recently gained significant interest as these problems are inherently non-convex and often large-scale; i.e. comprising up to several thousand buses [1]. Distributed optimization is considered to be helpful as it allows splitting large opf problems into several smaller subproblems; thus reducing complexity and avoiding the exchange of full grid models between subsystems. We refer to [2] for a recent overview of distributed optimization and control approaches in power systems.

One frequently discussed convex distributed optimization method is the Alternating Direction of Multipliers Method (admm) [3], which is also applied in context of ac opf [1, 4, 5]. admm often yields promising results even for large-scale power systems [4]. However, admm sometimes requires a specific partitioning technique and/or a feasible initialization in combination with high consensus-violation penalization parameters to converge [1]. The requirement of feasible initialization seems quite limiting as it requires solving a centralized inequality-constrained power flow problem requiring full topology and load information leading to approximately the same complexity as full opf.

In previous works [6,7,8] we suggested applying the Augmented Lagrangian Alternating Direction Inexact Newton (aladin) method to stochastic and deterministic opf problems ranging from 5 to 300 buses. In case a certain line-search is applied [9], aladin provides global convergence guarantees to local minimizers for non-convex problems without the need of feasible initialization. The results in [6] underpin that aladin is able to outperform admm in many cases. This comes at cost of a higher per-step information exchange compared with admm and a more complicated coordination step, cf. [6].

In this paper we investigate the interplay of feasible initialization with high penalization for consensus violation in admm for distributed ac opf. We illustrate our findings on the ieee 57-bus system. Furthermore, we compare the convergence behavior of admm to aladin not suffering from the practical need for feasible initialization [9]. Finally, we provide theoretical results supporting our numerical observations for the convergence behavior of admm.

The paper is organized as follows: In Sect. 2 we briefly recap admm and aladin including their convergence properties. Section 3 shows numerical results for the ieee 57-bus system focusing on the influence of the penalization parameter ρ on the convergence behavior of admm. Section 4 presents an analysis of the interplay between high penalization and a feasible initialization.

2 ALADIN and ADMM

For distributed optimization, opf problems are often formulated in affinely coupled separable form

$$\displaystyle \begin{aligned} \min_{x\in\mathbb{R}^{n_x}} \, \sum_{i\in \mathcal{R}} f_i(x_i) \quad \text{subject to}\quad x_i \in \mathcal{X}_i,\; \forall\, i \in \mathcal{R} \quad \text{and}\quad\sum_{i\in \mathcal{R}}A_i x_i=0, \end{aligned} $$
(1)

where the decision vector is divided into sub-vectors \(x^\top =[x_1^\top ,\dots ,x_{|\mathcal {R}|}^\top ] \in \mathbb {R}^{n_x}\), \(\mathcal {R}\) is the index set of subsystems usually representing geographical areas of a power system and local nonlinear constraint sets \(\mathcal {X}_i:=\{x_i \in \mathbb {R}^{n_{xi}} \; | \; h_{i}(x_i) \leq 0\}\). Throughout this work we assume that f i and h i are twice continuously differentiable and that all \(\mathcal {X}_i\) are compact. Note that the objective functions \(f_i:\mathbb {R}^{n_{xi}}\rightarrow \mathbb {R}\) and nonlinear inequality constraints \(h_i: \mathbb {R}^{n_{xi}}\rightarrow \mathbb {R}^{n_{hi}}\) only depend on x i and that coupling between them takes place in the affine consensus constraint \(\sum _{i\in \mathcal {R}}A_i x_i=0\) only. There are several ways of formulating opf problems in form of (1) differing in the coupling variables and the type of the power flow equations (polar or rectangular), cf. [4, 6, 10].

Algorithm 1 ADMM

Algorithm 2 ALADIN (full-step variant)

Here, we are interested in solving Problem (1) via admm and aladin summarized in Algorithms 1 and 2 respectively.Footnote 1 , Footnote 2 At first glance it is apparent that admm and aladin share several features. For example, in Step (1) of both algorithms, local augmented Lagrangians subject to local nonlinear inequality constraints h i are minimized in parallel.Footnote 3 Observe that while admm maintains multiple local Lagrange multipliers λ i, aladin considers one global Lagrange multiplier vector λ. In Step (2), aladin computes sensitivities B i, g i and C i (which often can directly be obtained from the local numerical solver without additional computation) whereas admm updates the multiplier vectors λ i.

In Step (3), both algorithms communicate certain information to a central entity which then solves a (usually centralized) coordination quadratic program. However, aladin and admm differ in the amount of exchanged information: Whereas admm only communicates the local primal and dual variables x i and λ i, aladin additionally communicates sensitivities. This is a considerable amount of extra information compared with admm. However, there exist methods to reduce the amount of exchanged information and bounds on the information exchange are given in [6]. Another important difference is the computational complexity of the coordination step. In many cases, the coordination step in admm can be reduced to an averaging step based on neighborhood communication only [3], whereas in aladin the coordination step involves the centralized solution of an equality constrained quadratic program.

In the last step, admm updates the primal variables z i, while aladin additionally updates the dual variables λ. Differences of aladin and admm also show up in the convergence speed and their theoretical convergence guarantees: Whereas aladin guarantees global convergence and quadratic local convergence for non-convex problems if a certain globalization strategy is applied [9], few results exist for admm in the non-convex setting. Recent works [12, 13] investigate the convergence of admm for special classes of non-convex problems; however, to the best of our knowledge opf problems do not belong to these classes.

3 Numerical Results

Next, we investigate the behavior of admm for large ρ and a feasible initialization to illustrate performance differences between admm and aladin. We consider power flow equations in polar form with coupling in active/reactive power and voltage angle and magnitude at the boundary between two neighbored regions [6]. We consider the ieee 57-bus system with data from matpower and partitioning as in [6] as numerical test case.

Figures 1 and 2 show the convergence behavior of admm (with and without feasible initialization (f.)) and aladin for several convergence indicators and two different penalty parameters ρ = 104 and ρ = 106.Footnote 4 Therein, the left-handed plot depicts the consensus gap ∥Ax k representing the maximum mismatch of coupling variables (active/reactive power and voltage magnitude/angle) at borders between two neighbored regions. The second plot shows the objective function value f k and the third plot presents the distance to the minimizer ∥x k − x over the iteration index k. The right-handed figure shows the nonlinear constraint violation ∥g(z k)∥ after the consensus steps (4) and (5) of Algorithms 1 and 2 respectively representing the maximum violation of the power flow equations.Footnote 5

Fig. 1
figure 1

Convergence behavior of admm with infeasible initialization, admm with feasible initialization (f.) for ρ = 104 and aladin

Fig. 2
figure 2

Convergence behavior of admm with infeasible initialization, admm with feasible initialization (f.) for ρ = 106 and aladin

In case of small ρ = 104, both admm variants behave similar and converge slowly towards the optimal solution with slow decrease in consensus violation, nonlinear constraint violation and objective function value. If we increase ρ to ρ = 106 with results shown in Fig. 2, the consensus violation ∥Ax k gets smaller in admm with feasible initialization. The reason is that a large ρ forces \(x_i^k\) being close to \(z_i^k\) leading to small ∥Ax k as we have Az k = 0 from the consensus step. But, at the same time, this also leads to a slower progress in optimality f k compared to ρ = 104, cf. the second plot in Figs. 1 and 2.

Fig. 3
figure 3

Convergence behavior of admm with infeasible initialization, admm with feasible initialization (f.) for ρ = 1012 and aladin

On the other hand, this statement does not hold for admm with infeasible initialization (blue lines in Figs. 1 and 2) as the constraints in the local step and the consensus step of admm enforce an alternating projection between the consensus constraint and the local nonlinear constraints. The progress in the nonlinear constraint violation ∥g(z k)∥ supports this statement. In its extreme, this behavior can be observed when using ρ = 1012 depicted in Fig. 3. There, admm with feasible initialization produces very small consensus violations and nonlinear constraint violations at cost of almost no progress in terms of optimality.

Here the crucial observation is that in case of feasible initialization and large penalization parameter ρ admm produces almost feasible iterates at cost of slow progress in the objective function values. From this, one is tempted to conclude that also for infeasible initializations admm will likely converge, cf. Figs. 1, 2, and 3. This conclusion is supported by the rather small 57-bus test system. However, it deserves to be noted that this conclusion is in general not valid, cf. [1].

For aladin, we use ρ = 106 and μ = 107. Comparing the results of admm with aladin, aladin shows superior quadratic convergence also in case of infeasible initialization. This is inline with the known convergence properties of aladin [9]. However, the fast convergence comes at the cost of increased communication overhead per step, cf. [6] for a more detailed discussion. Note that aladin involves a more complex coordination step, which is not straightforward to solve via neighborhood communication. Furthermore, tuning of ρ and μ can be difficult.

In power systems, usually feasibility is preferred over optimality to ensure a stable system operation. Hence, admm with feasible initialization and large ρ can in principle be used for opf as in this case violation of power flow equations and generator bounds are expected to be small and one could at least expect certain progress towards an optimal solution. Following this reasoning several papers consider ∥A(x k − z k)∥ < 𝜖 as termination criterion [1, 4]. However, as shown in the example above, if ρ is large enough, and admm is initialized at a feasible point, this termination criterion can always be satisfied in just one iteration if ρ is chosen sufficiently large. Consequently, it is unclear how to ensure a certain degree of optimality. An additional question with respect to admm is how to obtain a feasible initialization. To compute such an initial guess, one has to solve a constrained nonlinear least squares problem solving the power flow equations subject to box constraints. This is itself a problem of almost the same complexity as the full opf problem. Hence one would again require a computationally powerful central entity with full topology and parameter information. Arguably this jeopardizes the initial motivation for using distributed optimization methods.

4 Analysis of ADMM with Feasible Initialization for ρ →

The results above indicate that large penalization parameters ρ in combination with feasible initialization might lead to pre-mature convergence to a suboptimal solution. Arguably, this behavior of admm might be straight-forward to see from an optimization perspective. However, to the best of our knowledge a mathematically rigorous analysis, which is very relevant for opf, is not available in the literature.

Proposition 1 (Feasibility and ρ → imply \(x_i^{k+1}- x_i^k\in \operatorname {null}(A_i)\))

Consider the application of admm (Algorithm 1 ) to Problem (1). Suppose that, for all \(k \in \mathbb {N}\), the local problems (2) have unique regular minimizers \(x_i^k\).Footnote 6 For \(\tilde k\in \mathbb {N}\), let \(\lambda ^{\tilde k}_i\) be bounded and, for all \(i\in \mathcal {R}\) , \(z_i^{\tilde k} \in \mathcal {X}_i\). Then, the admm iterates satisfy

$$\displaystyle \begin{aligned} \lim_{\rho \to \infty}x_i^{k}(\rho) - x_i^{\tilde k} \in \operatorname{null}(A_i),\quad\forall k>\tilde k. \end{aligned}$$

Proof

The proof is divided into four steps. Steps 1–3 establish technical properties used to derive the above assertion in Step 4.

Step 1. At iteration \(\tilde k\) the local steps of admm are

$$\displaystyle \begin{aligned} x_i^{\tilde k}(\rho) = \underset{x_i\in \mathcal{X}_i}{\operatorname{argmin}}& \;f_i(x_i) + \left(\lambda_i^{\tilde k}\right)^\top A_i x_i + \frac{\rho}{2}\left\|A_i\left(x_i-z_i^{\tilde k}\right)\right\|{}^2_2. \end{aligned} $$
(6)

Now, by assumption all f is are twice continuously differentiable (hence bounded on \(\mathcal {X}_i\)), \(\lambda _i^{\tilde k}\) is bounded and all \({z_i^{\tilde k}\in \mathcal {X}_i}\). Thus, for all \(i \in \mathcal {R}\), \(\underset {\rho \rightarrow \infty }{\lim } x_i^{\tilde k}(\rho ) = z_i^{\tilde k} + v_i^{\tilde k}\) with \(v_i^{\tilde k} \in \operatorname {null}(A_i)\).

Step 2. The first-order stationarity condition of (6) can be written as

$$\displaystyle \begin{aligned} -\nabla f_i(x_i^{\tilde k}) - \gamma_i ^{\tilde k\top} \nabla h_i( x_i^{\tilde k}) = A_i^\top \lambda_i^{\tilde k} + \rho A_i^\top A_i\left(x_i^{\tilde k}-z_i^{\tilde k}\right), \end{aligned} $$
(7)

where \(\gamma _i ^{\tilde k\top }\) is the multiplier associated to h i. Multiplying the multiplier update formula (3) with \(A_i^\top \) from the left we obtain \(A_i^\top \lambda _i^{k+1} = A_i^\top \lambda _i^{k} + \rho A_i^\top A_i(x_i^k-z_i^k)\). Combined with (7) this yields \( A_i^\top \lambda _i^{\tilde k+1} = - \nabla f(x_i^{\tilde k}) - \gamma ^{\tilde k\top } \nabla h_i(x_i^{\tilde k})\). By differentiability of f i and h i, compactness of \(\mathcal {X}_i\) and regularity of \(x_i^{\tilde k}\) this implies boundedness of \(A_i^\top \lambda _i^{\tilde k+1}\).

Step 3. Next, we show by contradiction that \(\varDelta x_i^{\tilde k} \in \operatorname {null}(A_i)\) for all \(i \in \mathcal {R}\) and ρ →. Recall the coordination step (4b) in admm given by

$$\displaystyle \begin{aligned} \underset{\varDelta x}{\text{min}}\;\sum_{i\in \mathcal{R}} \dfrac{\rho}{2}\varDelta x_i^\top A_i^\top A_i\varDelta x_i + \lambda_i^{\tilde k+1\top}A_i \varDelta x_i \;\;\;\text{s.t.} \; \; \sum_{i\in \mathcal{R}}A_i(x^{\tilde k}_i+\varDelta x_i) = 0. \end{aligned} $$
(8)

Observe that any \(\varDelta x_i^{\tilde k} \in \operatorname {null}(A_i)\) is a feasible point to (8) as \(\sum _{i\in \mathcal {R}}A_ix^{\tilde k}_i=0\). Consider a feasible candidate solution \(\varDelta x_i \notin \operatorname {null}(A_i)\) for which \( \sum _{i\in \mathcal {R}}A_i(x^{\tilde k}_i+\varDelta x_i) = 0\). Clearly, \(\lambda _i^{\tilde k+1\top }A_i \varDelta x_i(\rho )\) will be bounded. Hence for a sufficiently large value of ρ, the objective of (8) will be positive. However, for any \(\varDelta x_i \in \operatorname {null}(A_i)\) the objective of (8) is zero, which contradicts optimality of the candidate solution \(\varDelta x_i \notin \operatorname {null}(A_i)\). Hence, choosing ρ sufficiently large ensures that any minimizer of (8) lies in \(\operatorname {null}(A_i)\).

Step 4. It remains to show \(x_i^{\tilde k + 1} = x_i^{\tilde k}\). In the last step of admm we have \(z^{\tilde k+1}=x^{\tilde k} + \varDelta x^{\tilde k}\). Given Steps (1)–(3) this yields \(z^{\tilde k+1}=z^{\tilde k} + v^{\tilde k} + \varDelta x^{\tilde k}\) and hence

$$\displaystyle \begin{aligned} \left\|A_i\left(x_i-z_i^{ \tilde k+1}\right)\right\|{}^2_2=\left\|A_i\left(x_i-z_i^{\tilde k} + v_i^{\tilde k} + \varDelta x_i^{\tilde k}\right)\right\|{}^2_2=\left\|A_i\left(x_i-z_i^{\tilde k}\right)\right\|{}^2_2.\end{aligned}$$

Observe that this implies that, for ρ →, problem (6) does not change from step \(\tilde k\) to \(\tilde k+1\). This proves the assertion. \(\hfill \blacksquare \)

Corollary 1 (Deterministic code, feasibility, ρ → implies \(x_i^{k+1}= x_i^k\))

Assuming that the local subproblems in admm are solved deterministically; i.e. same problem data yields the same solution. Then under the conditions of Proposition 1and for ρ ∞, once admm generates a feasible point \(x_i^{\tilde k}\) to Problem (1), or whenever it is initialized at a feasible point, it will stay at this point for all subsequent \(k>\tilde k\).

The above corollary explains the behavior of admm for large ρ in combination with feasible initialization often used in power systems [1, 4]. Despite feasible iterates are desirable from a power systems point of view, the findings above imply that high values of ρ limit progress in terms of minimizing the objective.

Remark 1 (Behavior of aladin for ρ →)

Note that for ρ →, aladin behaves different than admm. While the local problems in aladin behave similar to admm, the coordination step in aladin is equivalent to a sequential quadratic programming step. This helps avoiding premature convergence and it ensures decrease of f in the coordination step [9].

5 Conclusions

This method-oriented work investigated the interplay of penalization of consensus violation and feasible initialization in admm. We found that—despite often working reasonably with a good choice of ρ and infeasible initialization—in case of feasible initialization combined with large values of ρ admm typically stays feasible yet it may stall at a suboptimal solution. We provided analytical results supporting this observation. However, computing a feasible initialization is itself a problem of almost the same complexity as the full opf problem; in some sense partially jeopardizing the advantages of distributed optimization methods. Thus distributed methods providing rigorous convergence guarantees while allowing for infeasible initialization are of interest. One such alternative method is aladin [9] exhibiting convergence properties at cost of an enlarged communication overhead and a more complex coordination step [6].