Fast and stable nonconvex constrained distributed optimization: the ELLADA algorithm

Tang, Wentao; Daoutidis, Prodromos

doi:10.1007/s11081-020-09585-w

Fast and stable nonconvex constrained distributed optimization: the ELLADA algorithm

Research Article
Published: 03 January 2021

Volume 23, pages 259–301, (2022)
Cite this article

Optimization and Engineering Aims and scope Submit manuscript

1082 Accesses
11 Citations
Explore all metrics

Abstract

Distributed optimization using multiple computing agents in a localized and coordinated manner is a promising approach for solving large-scale optimization problems, e.g., those arising in model predictive control (MPC) of large-scale plants. However, a distributed optimization algorithm that is computationally efficient, globally convergent, amenable to nonconvex constraints remains an open problem. In this paper, we combine three important modifications to the classical alternating direction method of multipliers for distributed optimization. Specifically, (1) an extra-layer architecture is adopted to accommodate nonconvexity and handle inequality constraints, (2) equality-constrained nonlinear programming (NLP) problems are allowed to be solved approximately, and (3) a modified Anderson acceleration is employed for reducing the number of iterations. Theoretical convergence of the proposed algorithm, named ELLADA, is established and its numerical performance is demonstrated on a large-scale NLP benchmark problem. Its application to distributed nonlinear MPC is also described and illustrated through a benchmark process system.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An exhaustive review of the metaheuristic algorithms for search and optimization: taxonomy, applications, and open challenges

Article 09 April 2023

Multi-objective generalized normal distribution optimization: a novel algorithm for multi-objective problems

Article Open access 08 May 2024

CasADi: a software framework for nonlinear optimization and optimal control

Article 11 July 2018

Notes

There are two different types of Anderson acceleration. Here we focus on Type I, which was found to have better performance (Fang and Saad 2009) and was improved in Zhang et al. (2018).
We use the word “oracle” with its typical meaning in mathematics and computer science. An oracle refers to an ad hoc numerical or computational procedure, regarded as a black box mechanism, to generate the needed results as its outputs based on some input information.

References

Anderson DG (1965) Iterative procedures for nonlinear integral equations. J ACM 12(4):547–560
Article MathSciNet MATH Google Scholar
Bertsekas DP (2016) Nonlinear programming, 3rd edn. Athena Scientific, Nashua
MATH Google Scholar
Biegler LT, Thierry DM (2018) Large-scale optimization formulations and strategies for nonlinear model predictive control. IFAC-PapersOnLine 51(20):1–15, the 6th IFAC Conference on Nonlinear Model Predictive Control (NMPC)
Boyd S, Parikh N, Chu E, Peleato B, Eckstein J (2011) Distributed optimization and statistical learning via the alternating direction method of multipliers. Found Trend Mach Learn 3(1):1–122
Article MATH Google Scholar
Chen X, Heidarinejad M, Liu J, Christofides PD (2012) Distributed economic MPC: application to a nonlinear chemical process network. J Process Control 22(4):689–699
Article Google Scholar
Chen C, He B, Ye Y, Yuan X (2016) The direct extension of ADMM for multi-block convex minimization problems is not necessarily convergent. Math Prog 155(1–2):57–79
Article MathSciNet MATH Google Scholar
Christofides PD, Scattolini R, Muñoz de la Peña D, Liu J (2013) Distributed model predictive control: a tutorial review and future research directions. Comput Chem Eng 51:21–41
Article Google Scholar
Daoutidis P, Tang W, Jogwar SS (2018) Decomposing complex plants for distributed control: perspectives from network theory. Comput Chem Eng 114:43–51
Article Google Scholar
Daoutidis P, Tang W, Allman A (2019) Decomposition of control and optimization problems by network structure: concepts, methods and inspirations from biology. AIChE J 65(10):e16708
Article Google Scholar
Dhingra NK, Khong SZ, Jovanović MR (2019) The proximal augmented Lagrangian method for nonsmooth composite optimization. IEEE Trans Autom Control 64(7):2861–2868
Article MathSciNet MATH Google Scholar
Dolan ED, Moré JJ (2002) Benchmarking optimization software with performance profiles. Math Prog 91(2):201–213
Article MathSciNet MATH Google Scholar
Eckstein J, Bertsekas DP (1992) On the Douglas-Rachford splitting method and the proximal point algorithm for maximal monotone operators. Math Prog 55(1–3):293–318
Article MathSciNet MATH Google Scholar
Eckstein J, Yao W (2017) Approximate ADMM algorithms derived from Lagrangian splitting. Comput Optim Appl 68(2):363–405
Article MathSciNet MATH Google Scholar
Eckstein J, Yao W (2018) Relative-error approximate versions of Douglas-Rachford splitting and special cases of the ADMM. Math Prog 170(2):417–444
Article MathSciNet MATH Google Scholar
Fang Hr, Saad Y (2009) Two classes of multisecant methods for nonlinear acceleration. Numer Linear Algebra Appl 16(3):197–221
Article MathSciNet MATH Google Scholar
Farokhi F, Shames I, Johansson KH (2014) Distributed MPC via dual decomposition and alternative direction method of multipliers. In: Distributed model predictive control made easy. Springer, Berlin, pp 115–131
Fu A, Zhang J, Boyd S (2019) Anderson accelerated Douglas–Rachford splitting. arXiv preprint arXiv:190811482
Gabay D, Mercier B (1976) A dual algorithm for the solution of nonlinear variational problems via finite element approximation. Comput Math Appl 2(1):17–40
Article MATH Google Scholar
Giselsson P, Doan MD, Keviczky T, De Schutter B, Rantzer A (2013) Accelerated gradient methods and dual decomposition in distributed model predictive control. Automatica 49(3):829–833
Article MathSciNet MATH Google Scholar
Glowinski R, Marroco A (1975) Sur l’approximation, par éléments finis d’ordre un, et la résolution, par pénalisation-dualité d’une classe de problèmes de dirichlet non linéaires. Rev Fr Autom Inform Rech Opér, Anal Numér 9(R2):41–76
MATH Google Scholar
Goldstein T, O’Donoghue B, Setzer S, Baraniuk R (2014) Fast alternating direction optimization methods. SIAM J Imaging Sci 7(3):1588–1623
Article MathSciNet MATH Google Scholar
Hajinezhad D, Hong M (2019) Perturbed proximal primal–dual algorithm for nonconvex nonsmooth optimization. Math Prog 176(1–2):207–245
Article MathSciNet MATH Google Scholar
He B, Yuan X (2012) On the $O(1/n)$ convergence rate of the Douglas–Rachford alternating direction method. SIAM J Numer Anal 50(2):700–709
Article MathSciNet MATH Google Scholar
Hong M, Luo ZQ (2017) On the linear convergence of the alternating direction method of multipliers. Math Prog 162(1–2):165–199
Article MathSciNet MATH Google Scholar
Hong M, Luo ZQ, Razaviyayn M (2016) Convergence analysis of alternating direction method of multipliers for a family of nonconvex problems. SIAM J Optim 26(1):337–364
Article MathSciNet MATH Google Scholar
Hours JH, Jones CN (2015) A parametric nonconvex decomposition algorithm for real-time and distributed NMPC. IEEE Trans Autom Control 61(2):287–302
Article MathSciNet MATH Google Scholar
Houska B, Frasch J, Diehl M (2016) An augmented Lagrangian based algorithm for distributed nonconvex optimization. SIAM J Optim 26(2):1101–1127
Article MathSciNet MATH Google Scholar
Jalving J, Cao Y, Zavala VM (2019) Graph-based modeling and simulation of complex systems. Comput Chem Eng 125:134–154
Article Google Scholar
Jiang B, Lin T, Ma S, Zhang S (2019) Structured nonconvex and nonsmooth optimization: algorithms and iteration complexity analysis. Comput Optim Appl 72(1):115–157
Article MathSciNet MATH Google Scholar
Johansson KH (2000) The quadruple-tank process: a multivariable laboratory process with an adjustable zero. IEEE Trans Control Syst Technol 8(3):456–465
Article Google Scholar
Li G, Pong TK (2015) Global convergence of splitting methods for nonconvex composite optimization. SIAM J Optim 25(4):2434–2460
Article MathSciNet MATH Google Scholar
Liu J, Chen X, Muñoz de la Peña D, Christofides PD (2010) Sequential and iterative architectures for distributed model predictive control of nonlinear process systems. AIChE J 56(8):2137–2149
Google Scholar
Mota JF, Xavier JM, Aguiar PM, Püschel M (2014) Distributed optimization with local domains: applications in MPC and network flows. IEEE Trans Autom Control 60(7):2004–2009
Article MathSciNet MATH Google Scholar
Nesterov YuE (1983) A method of solving a convex programming problem with convergence rate $O(\frac{1}{k^2})$. Dokl Akad Nauk SSSR 269(3):543–547
MathSciNet Google Scholar
Nicholson B, Siirola JD, Watson JP, Zavala VM, Biegler LT (2018) pyomo.dae: a modeling and automatic discretization framework for optimization with differential and algebraic equations. Math Prog Comput 10(2):187–223
Article MathSciNet MATH Google Scholar
Nishihara R, Lessard L, Recht B, Packard A, Jordan M (2015) A general analysis of the convergence of ADMM. Proc Mach Learn Res 37:343–352
Google Scholar
Ouyang Y, Chen Y, Lan G, Pasiliao E Jr (2015) An accelerated linearized alternating direction method of multipliers. SIAM J Imaging Sci 8(1):644–681
Article MathSciNet MATH Google Scholar
Patterson MA, Rao AV (2014) GPOPS-II: a MATLAB software for solving multiple-phase optimal control problems using $h_p$-adaptive Gaussian quadrature collocation methods and sparse nonlinear programming. ACM Trans Math Softw (TOMS) 41(1):1–37
Article MATH Google Scholar
Pulay P (1980) Convergence acceleration of iterative sequences. The case of SCF iteration. Chem Phys Lett 73(2):393–398
Article Google Scholar
Rawlings JB, Mayne DQ, Diehl MM (2017) Model predictive control: theory, computation, and design, 2nd edn. Nob Hill Publishing, Madison
Google Scholar
Rockafellar RT, Wets RJB (1998) Variational analysis. Springer, Berlin
Book MATH Google Scholar
Scattolini R (2009) Architectures for distributed and hierarchical model predictive control—a review. J Process Control 19(5):723–731
Article Google Scholar
Scutari G, Facchinei F, Lampariello L (2016) Parallel and distributed methods for constrained nonconvex optimization—part I: theory. IEEE Trans Signal Process 65(8):1929–1944
Article MathSciNet MATH Google Scholar
Stewart BT, Venkat AN, Rawlings JB, Wright SJ, Pannocchia G (2010) Cooperative distributed model predictive control. Syst Control Lett 59(8):460–469
Article MathSciNet MATH Google Scholar
Sun K, Sun XA (2019) A two-level distributed algorithm for general constrained non-convex optimization with global convergence. arXiv preprint arXiv:190207654
Tang W, Allman A, Pourkargar DB, Daoutidis P (2018) Optimal decomposition for distributed optimization in nonlinear model predictive control through community detection. Comput Chem Eng 111:43–54
Article Google Scholar
Themelis A, Patrinos P (2020) Douglas-Rachford splitting and ADMM for nonconvex optimization: tight convergence results. SIAM J Optim 30(1):149–181
Article MathSciNet MATH Google Scholar
Toth A, Kelley C (2015) Convergence analysis for Anderson acceleration. SIAM J Numer Anal 53(2):805–819
Article MathSciNet MATH Google Scholar
Wächter A, Biegler LT (2005) Line search filter methods for nonlinear programming: motivation and global convergence. SIAM J Optim 16(1):1–31
Article MathSciNet MATH Google Scholar
Wächter A, Biegler LT (2006) On the implementation of an interior-point filter line-search algorithm for large-scale nonlinear programming. Math Prog 106(1):25–57
Article MathSciNet MATH Google Scholar
Wang Y, Boyd S (2009) Fast model predictive control using online optimization. IEEE Trans Control Syst Technol 18(2):267–278
Article Google Scholar
Wang Z, Ong CJ (2017) Distributed model predictive control of linear discrete-time systems with local and global constraints. Automatica 81:184–195
Article MathSciNet MATH Google Scholar
Wang Y, Yin W, Zeng J (2019) Global convergence of ADMM in nonconvex nonsmooth optimization. J Sci Comput 78(1):29–63
Article MathSciNet MATH Google Scholar
Xie J, Liao A, Yang X (2017) An inexact alternating direction method of multipliers with relative error criteria. Optim Lett 11(3):583–596
Article MathSciNet MATH Google Scholar
Yang Y, Hu G, Spanos CJ (2020) A proximal linearization-based fecentralized method for nonconvex problems with nonlinear constraints. arXiv preprint arXiv:200100767
Zhang RY, White JK (2018) GMRES-accelerated ADMM for quadratic objectives. SIAM J Optim 28(4):3025–3056
Article MathSciNet MATH Google Scholar
Zhang J, O’Donoghue B, Boyd S (2018) Globally convergent type-I Anderson acceleration for non-smooth fixed-point iterations. arXiv preprint arXiv:180803971
Zhang J, Peng Y, Ouyang W, Deng B (2019) Accelerating ADMM for efficient simulation and optimization. ACM Trans Graph 38(6):163
Article Google Scholar

Download references

Acknowledgements

This work was supported by National Science Foundation (NSF-CBET). The authors would also like to thank Prof. Qi Zhang for his constructive opinions.

Author information

Wentao Tang
Present address: Surface Operations, Projects and Technology, Shell Global Solutions (U.S.) Inc., Houston, TX, 77082, USA

Authors and Affiliations

Department of Chemical Engineering and Materials Science, University of Minnesota, Minneapolis, MN, 55455, USA
Wentao Tang & Prodromos Daoutidis

Authors

Wentao Tang
View author publications
You can also search for this author in PubMed Google Scholar
Prodromos Daoutidis
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Prodromos Daoutidis.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix 1: Proof of Lemma 1

We first prove that

$$\begin{aligned} \begin{aligned}&L\left( x^{k,r+1}, {\bar{x}}^{k,r+1}, z^{k,r+1}, y^{k,r+1}\right) \le L\left( x^{k,r}, {\bar{x}}^{k,r}, z^{k,r}, y^{k,r}\right) \\&\quad - \beta ^k\left\| B{\bar{x}}^{k,r+1} - B{\bar{x}}^{k,r}\right\| ^2 - \frac{\beta ^k}{2}\left\| z^{k,r+1} - z^{k,r}\right\| ^2 \end{aligned} \end{aligned}$$

(62)

for $r=0, 1, 2, \dots $. First, since $x^{k,r+1}$ is chosen as the minimizer of the augmented Lagrangian with respect to x (Line 9, Algorithm 1), the update of x leads to a decrease in L:

$$\begin{aligned} L\left( x^{k,r+1}, {\bar{x}}^{k,r}, z^{k,r}, y^{k,r}\right) \le L\left( x^{k,r}, {\bar{x}}^{k,r}, z^{k,r}, y^{k,r}\right) . \end{aligned}$$

(63)

Then consider the decrease resulted from ${\bar{x}}$-update:

$$\begin{aligned} \begin{aligned}&L\left( x^{k,r+1}, {\bar{x}}^{k,r+1}, z^{k,r}, y^{k,r}\right) - L\left( x^{k,r+1}, {\bar{x}}^{k,r}, z^{k,r}, y^{k,r}\right) \\&\quad = g\left( {\bar{x}}^{k,r+1}\right) - g\left( {\bar{x}}^{k,r}\right) + y^{k,r\top }\left( B{\bar{x}}^{k,r+1} - B{\bar{x}}^{k,r}\right) \\&\qquad + \frac{\rho ^k}{2} \left\| Ax^{k,r+1} + B{\bar{x}}^{k,r+1} + z^{k,r}\right\| ^2 - \frac{\rho ^k}{2} \left\| Ax^{k,r+1} + B{\bar{x}}^{k,r} + z^{k,r}\right\| ^2 \\&\quad = g\left( {\bar{x}}^{k,r+1}\right) - g\left( {\bar{x}}^{k,r}\right) - \frac{\rho ^k}{2}\left\| B{\bar{x}}^{k,r+1}-B{\bar{x}}^{k,r}\right\| ^2 \\&\qquad -\rho ^k \left( {\bar{x}}^{k,r} - {\bar{x}}^{k,r+1}\right) ^\top B^\top \left( Ax^{k,r+1} + B{\bar{x}}^{k,r+1} + z^{k,r} + \frac{y^{k,r}}{\rho ^k} \right) . \end{aligned} \end{aligned}$$

(64)

The minimization of ${\bar{x}}$ (Line 10, Algorithm 1) should satisfy the optimality condition

$$\begin{aligned} 0 \in \rho ^k B^\top \left( Ax^{k,r+1} + B{\bar{x}}^{k,r+1} + z^{k,r} + \frac{y^{k,r}}{\rho ^k} \right) + \partial g\left( {\bar{x}}^{k,r+1}\right) + {\mathcal {N}}_{\bar{{\mathcal {X}}}} \left( {\bar{x}}^{k,r+1}\right) , \end{aligned}$$

(65)

i.e., there exist vectors $v_1\in \partial g\left( {\bar{x}}^{k,r+1}\right) $ and $v_2 \in {\mathcal {N}}_{\bar{{\mathcal {X}}}}\left( {\bar{x}}^{k,r+1}\right) $ with

$$\begin{aligned} \rho ^k B^\top \left( Ax^{k,r+1} + B{\bar{x}}^{k,r+1} + z^{k,r} + \frac{y^{k,r}}{\rho ^k} \right) = -v_1 - v_2. \end{aligned}$$

(66)

Since $v_1\in \partial g\left( {\bar{x}}^{k,r+1}\right) $ and g is convex, $v_1^\top \left( {\bar{x}}^{k,r} - {\bar{x}}^{k,r+1}\right) \le g\left( {\bar{x}}^{k,r}\right) - g\left( {\bar{x}}^{k,r+1}\right) $. And $v_2 \in {\mathcal {N}}_{\bar{{\mathcal {X}}}} \left( {\bar{x}}^{k,r+1}\right) $ implies $v_2^\top \left( {\bar{x}}^{k,r} - {\bar{x}}^{k,r+1}\right) \le 0$. Hence

$$\begin{aligned} \begin{aligned}&\rho ^k \left( {\bar{x}}^{k,r} - {\bar{x}}^{k,r+1}\right) ^\top B^\top \left( Ax^{k,r+1} + B{\bar{x}}^{k,r+1} + z^{k,r} + \frac{y^{k,r}}{\rho ^k} \right) \\&\quad = -v_1^\top \left( {\bar{x}}^{k,r} - {\bar{x}}^{k,r+1}\right) - v_2^\top \left( {\bar{x}}^{k,r} - {\bar{x}}^{k,r+1}\right) \\&\quad \ge -\left( g\left( {\bar{x}}^{k,r}\right) - g\left( {\bar{x}}^{k,r+1}\right) \right) . \end{aligned} \end{aligned}$$

(67)

Substituting the above inequality in (64), we obtain

$$\begin{aligned}&L\left( x^{k,r+1}, {\bar{x}}^{k,r+1}, z^{k,r}, y^{k,r}\right) \le L\left( x^{k,r+1}, {\bar{x}}^{k,r}, z^{k,r}, y^{k,r}\right) \nonumber \\&\quad -\frac{\rho ^k}{2} \left\| B{\bar{x}}^{k,r+1}-B{\bar{x}}^{k,r}\right\| ^2. \end{aligned}$$

(68)

Third, we consider the decrease resulted from z- and y-updates:

$$\begin{aligned} \begin{aligned}&L\left( x^{k,r+1}, {\bar{x}}^{k,r+1}, z^{k,r+1}, y^{k,r+1}\right) - L\left( x^{k,r+1}, {\bar{x}}^{k,r+1}, z^{k,r}, y^{k,r}\right) \\&\quad = \lambda ^{k\top } \left( z^{k,r+1}-z^{k,r} \right) + \frac{\beta ^k}{2} \left( \left\| z^{k,r+1}\right\| ^2 - \left\| z^{k,r}\right\| ^2 \right) \\&\qquad + y^{k,r+1\top } \left( Ax^{k,r+1} + B{\bar{x}}^{k,r+1} + z^{k,r+1}\right) \\&\qquad - y^{k,r\top } \left( Ax^{k,r+1} + B{\bar{x}}^{k,r+1} + z^{k,r}\right) \\&\qquad + \frac{\rho ^k}{2} \left\| Ax^{k,r+1} + B{\bar{x}}^{k,r+1} + z^{k,r+1}\right\| ^2 \\&\qquad - \frac{\rho ^k}{2} \left\| Ax^{k,r+1} + B{\bar{x}}^{k,r+1} + z^{k,r}\right\| ^2. \end{aligned} \end{aligned}$$

(69)

Since $\upsilon (z; \lambda , \beta )=\lambda ^\top z + \frac{\beta }{2}\Vert z\Vert ^2$ is a convex function, whose gradient is $\nabla \upsilon (z; \lambda , \beta ) = \lambda + \beta z$,

$$\begin{aligned} \upsilon \left( z^{k,r+1}; \lambda ^k, \beta ^k \right) - \upsilon \left( z^{k,r}; \lambda ^k, \beta ^k \right) \le \left( \lambda ^k + \beta ^k z^{k,r+1} \right) ^\top \left( z^{k,r+1}-z^{k,r} \right) , \end{aligned}$$

(70)

From Line 11 of Algorithm 1 it can be obtained

$$\begin{aligned} \lambda ^k + \beta z^{k,r+1} = -y^{k,r+1}. \end{aligned}$$

(71)

Substituting into (69), we obtain

$$\begin{aligned} \begin{aligned}&L\left( x^{k,r+1}, {\bar{x}}^{k,r+1}, z^{k,r+1}, y^{k,r+1}\right) - L\left( x^{k,r+1}, {\bar{x}}^{k,r+1}, z^{k,r}, y^{k,r}\right) \\&\quad \le \left( y^{k,r+1}- y^{k,r}\right) ^\top \left( Ax^{k,r+1} + B{\bar{x}}^{k,r+1} + z^{k,r}\right) \\&\qquad + \frac{\rho ^k}{2} \left\| Ax^{k,r+1} + B{\bar{x}}^{k,r+1} + z^{k,r+1}\right\| ^2 - \frac{\rho ^k}{2} \left\| Ax^{k,r+1} + B{\bar{x}}^{k,r+1} + z^{k,r}\right\| ^2 \\&\quad = \frac{\rho ^k}{2} \left( Ax^{k,r+1} + B{\bar{x}}^{k,r+1} + z^{k,r+1}\right) ^\top \left( Ax^{k,r+1} + B{\bar{x}}^{k,r+1} + z^{k,r}\right) \\&\qquad + \frac{\rho ^k}{2} \left\| Ax^{k,r+1} + B{\bar{x}}^{k,r+1} + z^{k,r+1}\right\| ^2 - \frac{\rho ^k}{2} \left\| Ax^{k,r+1} + B{\bar{x}}^{k,r+1} + z^{k,r}\right\| ^2 \\&\quad = -\frac{\rho ^k}{2} \left\| z^{k,r+1}-z^{k,r}\right\| ^2 + \rho ^k \left\| Ax^{k,r+1} + B{\bar{x}}^{k,r+1} + z^{k,r+1}\right\| ^2 \end{aligned} \end{aligned}$$

(72)

From (71),

$$\begin{aligned} Ax^{k,r+1} + B{\bar{x}}^{k,r+1} + z^{k,r+1} = \frac{1}{\rho _k} \left( y^{k,r+1} - y^{k,r}\right) = -\frac{\beta ^k}{\rho ^k} \left( z^{k,r+1}-z^{k,r}\right) . \end{aligned}$$

(73)

Then (72) becomes

$$\begin{aligned} \begin{aligned}&L\left( x^{k,r+1}, {\bar{x}}^{k,r+1}, z^{k,r+1}, y^{k,r+1}\right) - L\left( x^{k,r+1}, {\bar{x}}^{k,r+1}, z^{k,r}, y^{k,r}\right) \\&\quad \le -\left( \frac{\rho ^k}{2}-\frac{\left( \beta ^k\right) ^2}{\rho ^k}\right) \left\| z^{k,r+1}-z^{k,r}\right\| ^2 = -\frac{\beta ^k}{2} \left\| z^{k,r+1}-z^{k,r}\right\| ^2. \end{aligned} \end{aligned}$$

(74)

Summing up the inequalities (63), (68) and (74), we have proved the inequality (62).

Next, we show that the augmented Lagrangian is lower bounded, and hence is convergent towards some ${\underline{L}}^k\in {\mathbb {R}}$. We note that $\upsilon (z; \lambda , \beta )$ is a convex function of modulus $\beta $, it can be easily verified that

$$\begin{aligned}&\upsilon \left( z^{k,r}; \lambda ^k, \beta ^k \right) + \left( \lambda ^k + \beta ^k z^{k,r}\right) ^\top \left( z^\prime - z^{k,r}\right) \nonumber \\&\quad + \frac{\rho ^k}{2} \left\| z^\prime - z^{k,r}\right\| ^2 \ge \upsilon \left( z^\prime ; \lambda ^k, \beta ^k\right) \end{aligned}$$

(75)

for any $z^\prime $, i.e.,

$$\begin{aligned} \upsilon \left( z^{k,r}; \lambda ^k, \beta ^k\right) + y^{k,r\top } \left( z^{k,r} - z^\prime \right) \ge \upsilon \left( z^\prime ; \lambda ^k, \beta ^k\right) - \frac{\rho ^k}{2}\left\| z^\prime - z^{k,r}\right\| ^2. \end{aligned}$$

(76)

Let $z^\prime = -\left( Ax^{k,r} + B{\bar{x}}^{k,r}\right) $ and remove the last term on the right-hand side. Then

$$\begin{aligned}&\upsilon \left( z^{k,r}; \lambda ^k, \beta ^k\right) + y^{k,r\top } \left( Ax^{k,r} + B{\bar{x}}^{k,r} + z^{k,r}\right) \nonumber \\&\quad \ge \upsilon \left( - \left( Ax^{k,r} + B{\bar{x}}^{k,r}\right) ; \lambda ^k, \beta ^k\right) . \end{aligned}$$

(77)

Hence

$$\begin{aligned} \begin{aligned}&L\left( x^{k,r}, {\bar{x}}^{k,r+1}, z^{k,r}, y^{k,r}\right) \\&\quad = f\left( x^{k,r}\right) + g\left( {\bar{x}}^{k,r}\right) + \upsilon \left( z^{k,r}; \lambda ^k, \beta ^k\right) \\&\qquad + y^{k,r\top } \left( Ax^{k,r} + B{\bar{x}}^{k,r} + z^{k,r}\right) + \frac{\rho ^k}{2} \left\| Ax^{k,r} + B{\bar{x}}^{k,r} + z^{k,r}\right\| ^2 \\&\quad \ge f\left( x^{k,r}\right) + g\left( {\bar{x}}^{k,r}\right) + \upsilon \left( -\left( Ax^{k,r} + B{\bar{x}}^{k,r}\right) ; \lambda ^k, \beta ^k\right) . \\ \end{aligned} \end{aligned}$$

(78)

Since $\upsilon (z) = \lambda ^\top z+\frac{\beta }{2}\Vert z\Vert ^2 \ge -\Vert \lambda \Vert ^2/(2\beta )$, $\lambda $ is bounded in $\left[ {\underline{\lambda }}, {\overline{\lambda }} \right] $, $\beta ^k\ge \beta ^1$, and f and g are bounded below, L has a lower bound.

Taking the limit $r\rightarrow \infty $ on the both sides of inequality (62), it becomes obvious that $B{\bar{x}}^{k,r+1}-B{\bar{x}}^{k,r}$ and $z^{k,r+1}-z^{k,r}$ converge to 0. Due to (73), we have $Ax^{k,r} + B{\bar{x}}^{k,r} + z^{k,r} \rightarrow 0$. Hence there must exist a r such that (22) is met. At this time, the optimality conditions for $x^{k,r+1}$ is written as

$$\begin{aligned} 0 \in \partial f\left( x^{k,r+1}\right) + {\mathcal {N}}_{\mathcal {X}}\left( x^{k,r+1}\right) + A^\top y^{k,r} + \rho ^kA^\top \left( Ax^{k,r+1}+B{\bar{x}}^{k,r}+z^{k,r} \right) . \end{aligned}$$

(79)

According to the update rule of $y^{k,r}$, the above expression is equivalent to

$$\begin{aligned} 0\in & {} \partial f\left( x^{k,r+1}\right) + {\mathcal {N}}_{\mathcal {X}} \left( x^{k,r+1}\right) + A^\top y^{k,r+1} -\rho ^k A^\top \nonumber \\&\left( B{\bar{x}}^{k,r+1}+z^{k,r+1}-B{\bar{x}}^{k,r}-z^{k,r}\right) , \end{aligned}$$

(80)

i.e.,

$$\begin{aligned}&\rho ^kA^\top \left( B{\bar{x}}^{k,r+1}+z^{k,r+1}-B{\bar{x}}^{k,r}-z^{k,r}\right) \in \partial f\left( x^{k,r+1}\right) \nonumber \\&\quad +\,{\mathcal {N}}_{\mathcal {X}}\left( x^{k,r+1}\right) + A^\top y^{k,r+1}. \end{aligned}$$

(81)

According to the first inequality of (22), the norm of the left hand side above is not larger than $\epsilon _1^k$, which directly implies the first condition in (23). In a similar manner, the second condition in (23) can be established. The third one follows from (71) and the fourth condition is obvious.

Appendix 2: Proof of Lemma 2

We first consider the situation when $\beta ^k$ is unbounded. From (78), we have

$$\begin{aligned} {\overline{L}} \ge f\left( x^{k+1}\right) + g\left( x^{k+1}\right) - \lambda ^{k\top }\left( Ax^{k+1} + B{\bar{x}}^{k+1}\right) + \frac{\beta ^k}{2}\left\| Ax^{k+1} + B{\bar{x}}^{k+1}\right\| ^2. \end{aligned}$$

(82)

Since f and g are both lower bounded, as $\beta ^k\rightarrow \infty $, we have $Ax^{k+1} + B{\bar{x}}^{k+1} \rightarrow 0$. Combined with the first two conditions of (23) in the limit of $\epsilon _1^k$, $\epsilon _2^k$, $\epsilon _3^k \downarrow 0$, we have reached (25).

Then we suppose that $\beta ^k$ is bounded, i.e., the amplification step $\beta ^{k+1}=\gamma \beta ^k$ is executed for only a finite number of outer iterations. According to Lines 17–21 of Algorithm 1, expect for some finite choices of k, $\left\| z^{k+1}\right\| \le \omega \left\| z^k\right\| $ always hold. Therefore $z^{k+1}\rightarrow 0$. Apparently, (25) follows from the limit of (23).

Appendix 3: Proof of Lemma 3

From Lemma 1 one knows that within R inner iterations

$$\begin{aligned} \frac{{\overline{L}} - {\underline{L}}^k}{\beta ^k} \ge \sum _{r=1}^{R} \left( \left\| B{\bar{x}}^{k,r+1} - B{\bar{x}}^{k,r} \right\| ^2 + \frac{1}{2} \left\| {\bar{z}}^{k,r+1} - z^{k,r} \right\| ^2 \right) . \end{aligned}$$

(83)

Then

$$\begin{aligned} \left\| B{\bar{x}}^{k,R+1} - B{\bar{x}}^{k,R} \right\| , \left\| z^{k,R+1} - z^{k,R} \right\| \sim {\mathcal {O}}\left( 1/\sqrt{\beta ^k R}\right) . \end{aligned}$$

(84)

For the kth outer iteration, its inner iterations are terminated when (22) is met, which is translated into the following relations:

$$\begin{aligned} \begin{aligned}&{\mathcal {O}}\left( \rho ^k/\sqrt{\beta ^k R^k} \right) \le \epsilon _1^k, \epsilon _2^k \sim {\mathcal {O}}\left( \vartheta ^k \right) , \\&{\mathcal {O}}\left( 1/\sqrt{\beta ^k R^k} \right) \le \epsilon _3^k \sim {\mathcal {O}}\left( \vartheta ^k/\beta ^k \right) . \end{aligned} \end{aligned}$$

(85)

where the last relation uses (73) with $\rho ^k=2\beta ^k$. Therefore

$$\begin{aligned} R^k \sim {\mathcal {O}}\left( \beta ^k/\vartheta ^{2k} \right) . \end{aligned}$$

(86)

At the end of the kth iteration, suppose that Lines 19–20 and Lines 17–18 of Algorithm 1 have been executed for $k_1$ and $k_2$ times, respectively ($k_1+k_2=k$). Then the obtained $z^{k+1}$ satisfies $\left\| z^{k+1}\right\| \sim {\mathcal {O}}\left( \omega ^{k_1}\right) $, and $\left\| Ax^{k+1}+B{\bar{x}}^{k+1}+z^{k+1}\right\| \le \epsilon _3^k \sim {\mathcal {O}}\left( \vartheta ^k/\beta ^k\right) $, which imply

$$\begin{aligned} \left\| Ax^{k+1}+B{\bar{x}}^{k+1}\right\| \le {\mathcal {O}}\left( \vartheta ^k/\beta ^k\right) + {\mathcal {O}}\left( \omega ^{k_1}\right) . \end{aligned}$$

(87)

From (82),

$$\begin{aligned} \beta ^k \left\| Ax^{k+1}+B{\bar{x}}^{k+1}\right\| ^2 \sim \beta ^k \left( {\mathcal {O}} \left( \vartheta ^k/\beta ^k\right) + {\mathcal {O}}\left( \omega ^{k_1}\right) \right) ^2 \sim {\mathcal {O}}(1). \end{aligned}$$

(88)

Substituting (88) into (86), we obtain

$$\begin{aligned} R^k \sim {\mathcal {O}}\left( \frac{1}{\vartheta ^{2k}} \frac{1}{\left( {\mathcal {O}}\left( \vartheta ^k/\beta ^k\right) + {\mathcal {O}}\left( \omega ^{k_1}\right) \right) ^2} \right) . \end{aligned}$$

(89)

When $\vartheta \le \omega $, $\vartheta ^k \le \omega ^{k} \le \omega ^{k_1}\gamma ^{k_2}$, and hence $\gamma ^{k_2}\vartheta ^k \le \omega ^{k_1}$, i.e., $\omega ^{k_1}$ dominates over $\vartheta ^k/\beta ^k$, leading to

$$\begin{aligned} R^k \sim {\mathcal {O}}\left( 1/\vartheta ^{2k}\omega ^{2k_1} \right) \sim {\mathcal {O}}\left( 1/\vartheta ^{2k}\omega ^{2k} \right) . \end{aligned}$$

(90)

For K outer iterations, the total number of inner iterations is

$$\begin{aligned} R = \sum _{k=1}^K R^k \sim {\mathcal {O}}\left( \sum _{k=1}^K \frac{1}{\vartheta ^{2k}\omega ^{2k}} \right) \sim {\mathcal {O}}\left( \frac{1}{\vartheta ^{2K}\omega ^{2K}} \right) . \end{aligned}$$

(91)

The number of outer iterations needed to reach an $\epsilon $-approximate stationary point is obviously $K\sim {\mathcal {O}}\left( \log _\vartheta \epsilon \right) $. Then

$$\begin{aligned} R \sim {\mathcal {O}}\left( \epsilon ^{-2(1+\varsigma )} \right) . \end{aligned}$$

(92)

Appendix 4: Proof of Lemma 6

Through the inner iterations, only Anderson acceleration might lead to an increase in the barrier augmented Lagrangian. Combining Assumptions 3, 5, and the safeguarding criterion (41), we obtain

$$\begin{aligned} L_{b^k}\left( x^{k,r+1}, {\bar{x}}^{k,r+1}, z^{k,r+1}, y^{k,r+1}\right) \le {\overline{L}} + {\tilde{L}}_0\eta _L\sum _{r=0}^{\infty } \frac{1}{r^{1+\sigma }} < +\infty , \end{aligned}$$

(93)

Together with Assumptions 1 and 2, $L_{b^k}$ is also bounded below. Therefore $L_{b^k}$ is bounded in a closed interval and must have converging subsequences. Therefore we can choose a subsequence converging to the lower limit ${\underline{L}}$. For any $\varepsilon >0$ there exists an index R of inner iteration in this subsequence, such that ${\tilde{L}}_0\eta _L \sum _{r=R}^\infty r^{-(1+\sigma )} < \varepsilon /2$ and $L_{b^k}\left( x^{k,r+1},{\bar{x}}^{k,r+1},z^{k,r+1},y^{k,r+1}\right) < {\underline{L}}+\varepsilon /2$ for any $r\ge R$ on this subsequence. It then follows that for any $r\ge R$, whether on the subsequence or not, it holds that

$$\begin{aligned} L_{b^k}\left( x^{k,r+1},{\bar{x}}^{k,r+1},z^{k,r+1},y^{k,r+1}\right) < {\underline{L}}+\varepsilon . \end{aligned}$$

(94)

Hence the upper limit is not larger than ${\underline{L}}+\varepsilon $. Due to the arbitrariness of $\varepsilon >0$, the lower limit coincides with the upper limit, and hence the sequence of barrier augmented Lagrangian is convergent.

The convergence of the barrier augmented Lagrangian implies that as $r \rightarrow \infty $,

$$\begin{aligned} L_{b^k}\left( x^{k,r+1},{\bar{x}}^{k,r+1},z^{k,r+1},y^{k,r+1}\right) - L_{b^k}\left( x^{k,r},{\bar{x}}^{k,r},z^{k,r},y^{k,r}\right) \rightarrow 0. \end{aligned}$$

(95)

Suppose that r is not an accelerated iteration, then since this quantity does not exceed $-\beta ^k\left\| B{\bar{x}}^{k,r+1} - B{\bar{x}}^{k,r}\right\| ^2 - \left( \beta ^k/2\right) \left\| z^{k,r+1}-z^{k,r}\right\| ^2$, we must have $B{\bar{x}}^{k,r+1} - B{\bar{x}}^{k,r} \rightarrow 0$ and $z^{k,r+1}-z^{k,r}\rightarrow 0$. Otherwise if inner iteration r is accelerated, the convergence of $B{\bar{x}}^{k,r+1} - B{\bar{x}}^{k,r}$ and $z^{k,r+1}-z^{k,r}$ are automatically guaranteed by the second criterion (42) of accepting Anderson acceleration. The convergence properties of these two sequences naturally fall into the paradigm of Lemma 1 for establishing the convergence to approximate KKT conditions of the relaxed problem.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tang, W., Daoutidis, P. Fast and stable nonconvex constrained distributed optimization: the ELLADA algorithm. Optim Eng 23, 259–301 (2022). https://doi.org/10.1007/s11081-020-09585-w

Download citation

Received: 04 May 2020
Revised: 29 November 2020
Accepted: 29 November 2020
Published: 03 January 2021
Issue Date: March 2022
DOI: https://doi.org/10.1007/s11081-020-09585-w

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fast and stable nonconvex constrained distributed optimization: the ELLADA algorithm

Abstract

Access this article

Similar content being viewed by others

An exhaustive review of the metaheuristic algorithms for search and optimization: taxonomy, applications, and open challenges

Multi-objective generalized normal distribution optimization: a novel algorithm for multi-objective problems

CasADi: a software framework for nonlinear optimization and optimal control

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendices

Appendix 1: Proof of Lemma 1

Appendix 2: Proof of Lemma 2

Appendix 3: Proof of Lemma 3

Appendix 4: Proof of Lemma 6

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Fast and stable nonconvex constrained distributed optimization: the ELLADA algorithm

Abstract

Access this article

Similar content being viewed by others

An exhaustive review of the metaheuristic algorithms for search and optimization: taxonomy, applications, and open challenges

Multi-objective generalized normal distribution optimization: a novel algorithm for multi-objective problems

CasADi: a software framework for nonlinear optimization and optimal control

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendices

Appendix 1: Proof of Lemma 1

Appendix 2: Proof of Lemma 2

Appendix 3: Proof of Lemma 3

Appendix 4: Proof of Lemma 6

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation