Abstract
This paper investigates the role of feasible initial guesses and large consensus-violation penalization in distributed optimization for Optimal Power Flow (opf) problems. Specifically, we discuss the behavior of the Alternating Direction of Multipliers Method (admm). We show that in case of large consensus-violation penalization admm might exhibit slow progress. We support this observation by an analysis of the algorithmic properties of admm. Furthermore, we illustrate our findings considering the ieee 57 bus system and we draw upon a comparison of admm and the Augmented Lagrangian Alternating Direction Inexact Newton (aladin) method.
This work is part of a project that receives funding from the European Union’s Horizon 2020 research and innovation program under grant agreement No. 730936. TF acknowledges further support from the Baden-Württemberg Stiftung under the Elite Programme for Postdocs.
You have full access to this open access chapter, Download conference paper PDF
Similar content being viewed by others
Keywords
1 Introduction
Distributed optimization algorithms for ac Optimal Power Flow (opf) recently gained significant interest as these problems are inherently non-convex and often large-scale; i.e. comprising up to several thousand buses [1]. Distributed optimization is considered to be helpful as it allows splitting large opf problems into several smaller subproblems; thus reducing complexity and avoiding the exchange of full grid models between subsystems. We refer to [2] for a recent overview of distributed optimization and control approaches in power systems.
One frequently discussed convex distributed optimization method is the Alternating Direction of Multipliers Method (admm) [3], which is also applied in context of ac opf [1, 4, 5]. admm often yields promising results even for large-scale power systems [4]. However, admm sometimes requires a specific partitioning technique and/or a feasible initialization in combination with high consensus-violation penalization parameters to converge [1]. The requirement of feasible initialization seems quite limiting as it requires solving a centralized inequality-constrained power flow problem requiring full topology and load information leading to approximately the same complexity as full opf.
In previous works [6,7,8] we suggested applying the Augmented Lagrangian Alternating Direction Inexact Newton (aladin) method to stochastic and deterministic opf problems ranging from 5 to 300 buses. In case a certain line-search is applied [9], aladin provides global convergence guarantees to local minimizers for non-convex problems without the need of feasible initialization. The results in [6] underpin that aladin is able to outperform admm in many cases. This comes at cost of a higher per-step information exchange compared with admm and a more complicated coordination step, cf. [6].
In this paper we investigate the interplay of feasible initialization with high penalization for consensus violation in admm for distributed ac opf. We illustrate our findings on the ieee 57-bus system. Furthermore, we compare the convergence behavior of admm to aladin not suffering from the practical need for feasible initialization [9]. Finally, we provide theoretical results supporting our numerical observations for the convergence behavior of admm.
The paper is organized as follows: In Sect. 2 we briefly recap admm and aladin including their convergence properties. Section 3 shows numerical results for the ieee 57-bus system focusing on the influence of the penalization parameter ρ on the convergence behavior of admm. Section 4 presents an analysis of the interplay between high penalization and a feasible initialization.
2 ALADIN and ADMM
For distributed optimization, opf problems are often formulated in affinely coupled separable form
where the decision vector is divided into sub-vectors \(x^\top =[x_1^\top ,\dots ,x_{|\mathcal {R}|}^\top ] \in \mathbb {R}^{n_x}\), \(\mathcal {R}\) is the index set of subsystems usually representing geographical areas of a power system and local nonlinear constraint sets \(\mathcal {X}_i:=\{x_i \in \mathbb {R}^{n_{xi}} \; | \; h_{i}(x_i) \leq 0\}\). Throughout this work we assume that f i and h i are twice continuously differentiable and that all \(\mathcal {X}_i\) are compact. Note that the objective functions \(f_i:\mathbb {R}^{n_{xi}}\rightarrow \mathbb {R}\) and nonlinear inequality constraints \(h_i: \mathbb {R}^{n_{xi}}\rightarrow \mathbb {R}^{n_{hi}}\) only depend on x i and that coupling between them takes place in the affine consensus constraint \(\sum _{i\in \mathcal {R}}A_i x_i=0\) only. There are several ways of formulating opf problems in form of (1) differing in the coupling variables and the type of the power flow equations (polar or rectangular), cf. [4, 6, 10].
Algorithm 1 ADMM
Algorithm 2 ALADIN (full-step variant)
Here, we are interested in solving Problem (1) via admm and aladin summarized in Algorithms 1 and 2 respectively.Footnote 1 , Footnote 2 At first glance it is apparent that admm and aladin share several features. For example, in Step (1) of both algorithms, local augmented Lagrangians subject to local nonlinear inequality constraints h i are minimized in parallel.Footnote 3 Observe that while admm maintains multiple local Lagrange multipliers λ i, aladin considers one global Lagrange multiplier vector λ. In Step (2), aladin computes sensitivities B i, g i and C i (which often can directly be obtained from the local numerical solver without additional computation) whereas admm updates the multiplier vectors λ i.
In Step (3), both algorithms communicate certain information to a central entity which then solves a (usually centralized) coordination quadratic program. However, aladin and admm differ in the amount of exchanged information: Whereas admm only communicates the local primal and dual variables x i and λ i, aladin additionally communicates sensitivities. This is a considerable amount of extra information compared with admm. However, there exist methods to reduce the amount of exchanged information and bounds on the information exchange are given in [6]. Another important difference is the computational complexity of the coordination step. In many cases, the coordination step in admm can be reduced to an averaging step based on neighborhood communication only [3], whereas in aladin the coordination step involves the centralized solution of an equality constrained quadratic program.
In the last step, admm updates the primal variables z i, while aladin additionally updates the dual variables λ. Differences of aladin and admm also show up in the convergence speed and their theoretical convergence guarantees: Whereas aladin guarantees global convergence and quadratic local convergence for non-convex problems if a certain globalization strategy is applied [9], few results exist for admm in the non-convex setting. Recent works [12, 13] investigate the convergence of admm for special classes of non-convex problems; however, to the best of our knowledge opf problems do not belong to these classes.
3 Numerical Results
Next, we investigate the behavior of admm for large ρ and a feasible initialization to illustrate performance differences between admm and aladin. We consider power flow equations in polar form with coupling in active/reactive power and voltage angle and magnitude at the boundary between two neighbored regions [6]. We consider the ieee 57-bus system with data from matpower and partitioning as in [6] as numerical test case.
Figures 1 and 2 show the convergence behavior of admm (with and without feasible initialization (f.)) and aladin for several convergence indicators and two different penalty parameters ρ = 104 and ρ = 106.Footnote 4 Therein, the left-handed plot depicts the consensus gap ∥Ax k∥∞ representing the maximum mismatch of coupling variables (active/reactive power and voltage magnitude/angle) at borders between two neighbored regions. The second plot shows the objective function value f k and the third plot presents the distance to the minimizer ∥x k − x ⋆∥∞ over the iteration index k. The right-handed figure shows the nonlinear constraint violation ∥g(z k)∥∞ after the consensus steps (4) and (5) of Algorithms 1 and 2 respectively representing the maximum violation of the power flow equations.Footnote 5
In case of small ρ = 104, both admm variants behave similar and converge slowly towards the optimal solution with slow decrease in consensus violation, nonlinear constraint violation and objective function value. If we increase ρ to ρ = 106 with results shown in Fig. 2, the consensus violation ∥Ax k∥∞ gets smaller in admm with feasible initialization. The reason is that a large ρ forces \(x_i^k\) being close to \(z_i^k\) leading to small ∥Ax k∥∞ as we have Az k = 0 from the consensus step. But, at the same time, this also leads to a slower progress in optimality f k compared to ρ = 104, cf. the second plot in Figs. 1 and 2.
On the other hand, this statement does not hold for admm with infeasible initialization (blue lines in Figs. 1 and 2) as the constraints in the local step and the consensus step of admm enforce an alternating projection between the consensus constraint and the local nonlinear constraints. The progress in the nonlinear constraint violation ∥g(z k)∥∞ supports this statement. In its extreme, this behavior can be observed when using ρ = 1012 depicted in Fig. 3. There, admm with feasible initialization produces very small consensus violations and nonlinear constraint violations at cost of almost no progress in terms of optimality.
Here the crucial observation is that in case of feasible initialization and large penalization parameter ρ admm produces almost feasible iterates at cost of slow progress in the objective function values. From this, one is tempted to conclude that also for infeasible initializations admm will likely converge, cf. Figs. 1, 2, and 3. This conclusion is supported by the rather small 57-bus test system. However, it deserves to be noted that this conclusion is in general not valid, cf. [1].
For aladin, we use ρ = 106 and μ = 107. Comparing the results of admm with aladin, aladin shows superior quadratic convergence also in case of infeasible initialization. This is inline with the known convergence properties of aladin [9]. However, the fast convergence comes at the cost of increased communication overhead per step, cf. [6] for a more detailed discussion. Note that aladin involves a more complex coordination step, which is not straightforward to solve via neighborhood communication. Furthermore, tuning of ρ and μ can be difficult.
In power systems, usually feasibility is preferred over optimality to ensure a stable system operation. Hence, admm with feasible initialization and large ρ can in principle be used for opf as in this case violation of power flow equations and generator bounds are expected to be small and one could at least expect certain progress towards an optimal solution. Following this reasoning several papers consider ∥A(x k − z k)∥∞ < 𝜖 as termination criterion [1, 4]. However, as shown in the example above, if ρ is large enough, and admm is initialized at a feasible point, this termination criterion can always be satisfied in just one iteration if ρ is chosen sufficiently large. Consequently, it is unclear how to ensure a certain degree of optimality. An additional question with respect to admm is how to obtain a feasible initialization. To compute such an initial guess, one has to solve a constrained nonlinear least squares problem solving the power flow equations subject to box constraints. This is itself a problem of almost the same complexity as the full opf problem. Hence one would again require a computationally powerful central entity with full topology and parameter information. Arguably this jeopardizes the initial motivation for using distributed optimization methods.
4 Analysis of ADMM with Feasible Initialization for ρ →∞
The results above indicate that large penalization parameters ρ in combination with feasible initialization might lead to pre-mature convergence to a suboptimal solution. Arguably, this behavior of admm might be straight-forward to see from an optimization perspective. However, to the best of our knowledge a mathematically rigorous analysis, which is very relevant for opf, is not available in the literature.
Proposition 1 (Feasibility and ρ →∞ imply \(x_i^{k+1}- x_i^k\in \operatorname {null}(A_i)\))
Consider the application of admm (Algorithm 1 ) to Problem (1). Suppose that, for all \(k \in \mathbb {N}\), the local problems (2) have unique regular minimizers \(x_i^k\).Footnote 6 For \(\tilde k\in \mathbb {N}\), let \(\lambda ^{\tilde k}_i\) be bounded and, for all \(i\in \mathcal {R}\) , \(z_i^{\tilde k} \in \mathcal {X}_i\). Then, the admm iterates satisfy
Proof
The proof is divided into four steps. Steps 1–3 establish technical properties used to derive the above assertion in Step 4.
Step 1. At iteration \(\tilde k\) the local steps of admm are
Now, by assumption all f is are twice continuously differentiable (hence bounded on \(\mathcal {X}_i\)), \(\lambda _i^{\tilde k}\) is bounded and all \({z_i^{\tilde k}\in \mathcal {X}_i}\). Thus, for all \(i \in \mathcal {R}\), \(\underset {\rho \rightarrow \infty }{\lim } x_i^{\tilde k}(\rho ) = z_i^{\tilde k} + v_i^{\tilde k}\) with \(v_i^{\tilde k} \in \operatorname {null}(A_i)\).
Step 2. The first-order stationarity condition of (6) can be written as
where \(\gamma _i ^{\tilde k\top }\) is the multiplier associated to h i. Multiplying the multiplier update formula (3) with \(A_i^\top \) from the left we obtain \(A_i^\top \lambda _i^{k+1} = A_i^\top \lambda _i^{k} + \rho A_i^\top A_i(x_i^k-z_i^k)\). Combined with (7) this yields \( A_i^\top \lambda _i^{\tilde k+1} = - \nabla f(x_i^{\tilde k}) - \gamma ^{\tilde k\top } \nabla h_i(x_i^{\tilde k})\). By differentiability of f i and h i, compactness of \(\mathcal {X}_i\) and regularity of \(x_i^{\tilde k}\) this implies boundedness of \(A_i^\top \lambda _i^{\tilde k+1}\).
Step 3. Next, we show by contradiction that \(\varDelta x_i^{\tilde k} \in \operatorname {null}(A_i)\) for all \(i \in \mathcal {R}\) and ρ →∞. Recall the coordination step (4b) in admm given by
Observe that any \(\varDelta x_i^{\tilde k} \in \operatorname {null}(A_i)\) is a feasible point to (8) as \(\sum _{i\in \mathcal {R}}A_ix^{\tilde k}_i=0\). Consider a feasible candidate solution \(\varDelta x_i \notin \operatorname {null}(A_i)\) for which \( \sum _{i\in \mathcal {R}}A_i(x^{\tilde k}_i+\varDelta x_i) = 0\). Clearly, \(\lambda _i^{\tilde k+1\top }A_i \varDelta x_i(\rho )\) will be bounded. Hence for a sufficiently large value of ρ, the objective of (8) will be positive. However, for any \(\varDelta x_i \in \operatorname {null}(A_i)\) the objective of (8) is zero, which contradicts optimality of the candidate solution \(\varDelta x_i \notin \operatorname {null}(A_i)\). Hence, choosing ρ sufficiently large ensures that any minimizer of (8) lies in \(\operatorname {null}(A_i)\).
Step 4. It remains to show \(x_i^{\tilde k + 1} = x_i^{\tilde k}\). In the last step of admm we have \(z^{\tilde k+1}=x^{\tilde k} + \varDelta x^{\tilde k}\). Given Steps (1)–(3) this yields \(z^{\tilde k+1}=z^{\tilde k} + v^{\tilde k} + \varDelta x^{\tilde k}\) and hence
Observe that this implies that, for ρ →∞, problem (6) does not change from step \(\tilde k\) to \(\tilde k+1\). This proves the assertion. \(\hfill \blacksquare \)
Corollary 1 (Deterministic code, feasibility, ρ →∞ implies \(x_i^{k+1}= x_i^k\))
Assuming that the local subproblems in admm are solved deterministically; i.e. same problem data yields the same solution. Then under the conditions of Proposition 1and for ρ →∞, once admm generates a feasible point \(x_i^{\tilde k}\) to Problem (1), or whenever it is initialized at a feasible point, it will stay at this point for all subsequent \(k>\tilde k\).
The above corollary explains the behavior of admm for large ρ in combination with feasible initialization often used in power systems [1, 4]. Despite feasible iterates are desirable from a power systems point of view, the findings above imply that high values of ρ limit progress in terms of minimizing the objective.
Remark 1 (Behavior of aladin for ρ →∞)
Note that for ρ →∞, aladin behaves different than admm. While the local problems in aladin behave similar to admm, the coordination step in aladin is equivalent to a sequential quadratic programming step. This helps avoiding premature convergence and it ensures decrease of f in the coordination step [9].
5 Conclusions
This method-oriented work investigated the interplay of penalization of consensus violation and feasible initialization in admm. We found that—despite often working reasonably with a good choice of ρ and infeasible initialization—in case of feasible initialization combined with large values of ρ admm typically stays feasible yet it may stall at a suboptimal solution. We provided analytical results supporting this observation. However, computing a feasible initialization is itself a problem of almost the same complexity as the full opf problem; in some sense partially jeopardizing the advantages of distributed optimization methods. Thus distributed methods providing rigorous convergence guarantees while allowing for infeasible initialization are of interest. One such alternative method is aladin [9] exhibiting convergence properties at cost of an enlarged communication overhead and a more complex coordination step [6].
Notes
- 1.
- 2.
Note that, due to space limitations, we describe the full-step variant of aladin here. To obtain convergence guarantees from a remote starting point, a globalization strategy is necessary, cf. [9].
- 3.
For notational simplicity, we only consider nonlinear inequality constraints here. Nonlinear equality constraints g i can be incorporated via a reformulation in terms of two inequality constraints, i.e. 0 ≤ g i(x i) ≤ 0.
- 4.
The scaling matrices Σ i are diagonal. They are chosen to improve convergence. Hence, entries corresponding to voltages and phase angles are 100, entries corresponding to powers are set to 1.
- 5.
The power flow equations for the ieee 57-bus systems are considered as nonlinear equality constraints g i(x i) = 0. Hence, g i(x i) ≠ 0 represents a violation of the power flow equations.
- 6.
A minimizer is regular if the gradients of the active constraints are linear independent [14].
References
J. Guo, G. Hug, O.K. Tonguz, IEEE Trans. Power Syst. 32(5), 3842 (2017). https://doi.org/10.1109/TPWRS.2016.2636811
D.K. Molzahn, F. Dörfler, H. Sandberg, S.H. Low, S. Chakrabarti, R. Baldick, J. Lavaei, IEEE Trans. Smart Grid 8(6), 2941 (2017). https://doi.org/10.1109/TSG.2017.2720471
S. Boyd, N. Parikh, E. Chu, B. Peleato, J. Eckstein, Found Trends Mach. Learn. 3(1), 1 (2011)
T. Erseghe, IEEE Trans. Power Syst. 29(5), 2370 (2014). https://doi.org/10.1109/TPWRS.2014.2306495
B.H. Kim, R. Baldick, IEEE Trans. Power Syst. 15(2), 599 (2000)
A. Engelmann, Y. Jiang, T. Mühlpfordt, B. Houska, T. Faulwasser, IEEE Trans. Power Syst. 34(1), 584 (2019)
A. Engelmann, T. Mühlpfordt, Y. Jiang, B. Houska, T. Faulwasser, in Proceedings of the American Control Conference (ACC), 2018, pp. 6188–6193. https://doi.org/10.23919/ACC.2018.8431090
A. Murray, A. Engelmann, V. Hagenmeyer, T. Faulwasser, IFAC-PapersOnLine 51(28), 368 (2018). https://doi.org/10.1016/j.ifacol.2018.11.730. http://www.sciencedirect.com/science/article/pii/S2405896318334505. 10th IFAC Symposium on Control of Power and Energy Systems CPES 2018
B. Houska, J. Frasch, M. Diehl, SIAM J. Optim. 26(2), 1101 (2016)
B.H. Kim, R. Baldick, IEEE Trans. Power Syst. 12(2), 932 (1997)
D.P. Bertsekas, J.N. Tsitsiklis, Parallel and Distributed Computation: Numerical Methods, vol. 23 (Prentice Hall, Englewood Cliffs, 1989)
Y. Wang, W. Yin, J. Zeng, J. Sci. Comput. 78(1), 29 (2019)
M. Hong, Z.Q. Luo, M. Razaviyayn, SIAM J. Optim. 26(1), 337 (2016)
J. Nocedal, S. Wright, Numerical Optimization (Springer Science & Business Media, New York, 2006)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
Copyright information
© 2020 The Author(s)
About this paper
Cite this paper
Engelmann, A., Faulwasser, T. (2020). Feasibility vs. Optimality in Distributed AC OPF: A Case Study Considering ADMM and ALADIN. In: Bertsch, V., Ardone, A., Suriyah, M., Fichtner, W., Leibfried, T., Heuveline, V. (eds) Advances in Energy System Optimization. ISESO 2018. Trends in Mathematics. Birkhäuser, Cham. https://doi.org/10.1007/978-3-030-32157-4_1
Download citation
DOI: https://doi.org/10.1007/978-3-030-32157-4_1
Published:
Publisher Name: Birkhäuser, Cham
Print ISBN: 978-3-030-32156-7
Online ISBN: 978-3-030-32157-4
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)