Skip to main content

Advertisement

Log in

Fast and stable nonconvex constrained distributed optimization: the ELLADA algorithm

  • Research Article
  • Published:
Optimization and Engineering Aims and scope Submit manuscript

Abstract

Distributed optimization using multiple computing agents in a localized and coordinated manner is a promising approach for solving large-scale optimization problems, e.g., those arising in model predictive control (MPC) of large-scale plants. However, a distributed optimization algorithm that is computationally efficient, globally convergent, amenable to nonconvex constraints remains an open problem. In this paper, we combine three important modifications to the classical alternating direction method of multipliers for distributed optimization. Specifically, (1) an extra-layer architecture is adopted to accommodate nonconvexity and handle inequality constraints, (2) equality-constrained nonlinear programming (NLP) problems are allowed to be solved approximately, and (3) a modified Anderson acceleration is employed for reducing the number of iterations. Theoretical convergence of the proposed algorithm, named ELLADA, is established and its numerical performance is demonstrated on a large-scale NLP benchmark problem. Its application to distributed nonlinear MPC is also described and illustrated through a benchmark process system.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Notes

  1. There are two different types of Anderson acceleration. Here we focus on Type I, which was found to have better performance (Fang and Saad 2009) and was improved in Zhang et al. (2018).

  2. We use the word “oracle” with its typical meaning in mathematics and computer science. An oracle refers to an ad hoc numerical or computational procedure, regarded as a black box mechanism, to generate the needed results as its outputs based on some input information.

References

  • Anderson DG (1965) Iterative procedures for nonlinear integral equations. J ACM 12(4):547–560

    Article  MathSciNet  MATH  Google Scholar 

  • Bertsekas DP (2016) Nonlinear programming, 3rd edn. Athena Scientific, Nashua

    MATH  Google Scholar 

  • Biegler LT, Thierry DM (2018) Large-scale optimization formulations and strategies for nonlinear model predictive control. IFAC-PapersOnLine 51(20):1–15, the 6th IFAC Conference on Nonlinear Model Predictive Control (NMPC)

  • Boyd S, Parikh N, Chu E, Peleato B, Eckstein J (2011) Distributed optimization and statistical learning via the alternating direction method of multipliers. Found Trend Mach Learn 3(1):1–122

    Article  MATH  Google Scholar 

  • Chen X, Heidarinejad M, Liu J, Christofides PD (2012) Distributed economic MPC: application to a nonlinear chemical process network. J Process Control 22(4):689–699

    Article  Google Scholar 

  • Chen C, He B, Ye Y, Yuan X (2016) The direct extension of ADMM for multi-block convex minimization problems is not necessarily convergent. Math Prog 155(1–2):57–79

    Article  MathSciNet  MATH  Google Scholar 

  • Christofides PD, Scattolini R, Muñoz de la Peña D, Liu J (2013) Distributed model predictive control: a tutorial review and future research directions. Comput Chem Eng 51:21–41

    Article  Google Scholar 

  • Daoutidis P, Tang W, Jogwar SS (2018) Decomposing complex plants for distributed control: perspectives from network theory. Comput Chem Eng 114:43–51

    Article  Google Scholar 

  • Daoutidis P, Tang W, Allman A (2019) Decomposition of control and optimization problems by network structure: concepts, methods and inspirations from biology. AIChE J 65(10):e16708

    Article  Google Scholar 

  • Dhingra NK, Khong SZ, Jovanović MR (2019) The proximal augmented Lagrangian method for nonsmooth composite optimization. IEEE Trans Autom Control 64(7):2861–2868

    Article  MathSciNet  MATH  Google Scholar 

  • Dolan ED, Moré JJ (2002) Benchmarking optimization software with performance profiles. Math Prog 91(2):201–213

    Article  MathSciNet  MATH  Google Scholar 

  • Eckstein J, Bertsekas DP (1992) On the Douglas-Rachford splitting method and the proximal point algorithm for maximal monotone operators. Math Prog 55(1–3):293–318

    Article  MathSciNet  MATH  Google Scholar 

  • Eckstein J, Yao W (2017) Approximate ADMM algorithms derived from Lagrangian splitting. Comput Optim Appl 68(2):363–405

    Article  MathSciNet  MATH  Google Scholar 

  • Eckstein J, Yao W (2018) Relative-error approximate versions of Douglas-Rachford splitting and special cases of the ADMM. Math Prog 170(2):417–444

    Article  MathSciNet  MATH  Google Scholar 

  • Fang Hr, Saad Y (2009) Two classes of multisecant methods for nonlinear acceleration. Numer Linear Algebra Appl 16(3):197–221

    Article  MathSciNet  MATH  Google Scholar 

  • Farokhi F, Shames I, Johansson KH (2014) Distributed MPC via dual decomposition and alternative direction method of multipliers. In: Distributed model predictive control made easy. Springer, Berlin, pp 115–131

  • Fu A, Zhang J, Boyd S (2019) Anderson accelerated Douglas–Rachford splitting. arXiv preprint arXiv:190811482

  • Gabay D, Mercier B (1976) A dual algorithm for the solution of nonlinear variational problems via finite element approximation. Comput Math Appl 2(1):17–40

    Article  MATH  Google Scholar 

  • Giselsson P, Doan MD, Keviczky T, De Schutter B, Rantzer A (2013) Accelerated gradient methods and dual decomposition in distributed model predictive control. Automatica 49(3):829–833

    Article  MathSciNet  MATH  Google Scholar 

  • Glowinski R, Marroco A (1975) Sur l’approximation, par éléments finis d’ordre un, et la résolution, par pénalisation-dualité d’une classe de problèmes de dirichlet non linéaires. Rev Fr Autom Inform Rech Opér, Anal Numér 9(R2):41–76

    MATH  Google Scholar 

  • Goldstein T, O’Donoghue B, Setzer S, Baraniuk R (2014) Fast alternating direction optimization methods. SIAM J Imaging Sci 7(3):1588–1623

    Article  MathSciNet  MATH  Google Scholar 

  • Hajinezhad D, Hong M (2019) Perturbed proximal primal–dual algorithm for nonconvex nonsmooth optimization. Math Prog 176(1–2):207–245

    Article  MathSciNet  MATH  Google Scholar 

  • He B, Yuan X (2012) On the \(O(1/n)\) convergence rate of the Douglas–Rachford alternating direction method. SIAM J Numer Anal 50(2):700–709

    Article  MathSciNet  MATH  Google Scholar 

  • Hong M, Luo ZQ (2017) On the linear convergence of the alternating direction method of multipliers. Math Prog 162(1–2):165–199

    Article  MathSciNet  MATH  Google Scholar 

  • Hong M, Luo ZQ, Razaviyayn M (2016) Convergence analysis of alternating direction method of multipliers for a family of nonconvex problems. SIAM J Optim 26(1):337–364

    Article  MathSciNet  MATH  Google Scholar 

  • Hours JH, Jones CN (2015) A parametric nonconvex decomposition algorithm for real-time and distributed NMPC. IEEE Trans Autom Control 61(2):287–302

    Article  MathSciNet  MATH  Google Scholar 

  • Houska B, Frasch J, Diehl M (2016) An augmented Lagrangian based algorithm for distributed nonconvex optimization. SIAM J Optim 26(2):1101–1127

    Article  MathSciNet  MATH  Google Scholar 

  • Jalving J, Cao Y, Zavala VM (2019) Graph-based modeling and simulation of complex systems. Comput Chem Eng 125:134–154

    Article  Google Scholar 

  • Jiang B, Lin T, Ma S, Zhang S (2019) Structured nonconvex and nonsmooth optimization: algorithms and iteration complexity analysis. Comput Optim Appl 72(1):115–157

    Article  MathSciNet  MATH  Google Scholar 

  • Johansson KH (2000) The quadruple-tank process: a multivariable laboratory process with an adjustable zero. IEEE Trans Control Syst Technol 8(3):456–465

    Article  Google Scholar 

  • Li G, Pong TK (2015) Global convergence of splitting methods for nonconvex composite optimization. SIAM J Optim 25(4):2434–2460

    Article  MathSciNet  MATH  Google Scholar 

  • Liu J, Chen X, Muñoz de la Peña D, Christofides PD (2010) Sequential and iterative architectures for distributed model predictive control of nonlinear process systems. AIChE J 56(8):2137–2149

    Google Scholar 

  • Mota JF, Xavier JM, Aguiar PM, Püschel M (2014) Distributed optimization with local domains: applications in MPC and network flows. IEEE Trans Autom Control 60(7):2004–2009

    Article  MathSciNet  MATH  Google Scholar 

  • Nesterov YuE (1983) A method of solving a convex programming problem with convergence rate \(O(\frac{1}{k^2})\). Dokl Akad Nauk SSSR 269(3):543–547

    MathSciNet  Google Scholar 

  • Nicholson B, Siirola JD, Watson JP, Zavala VM, Biegler LT (2018) pyomo.dae: a modeling and automatic discretization framework for optimization with differential and algebraic equations. Math Prog Comput 10(2):187–223

    Article  MathSciNet  MATH  Google Scholar 

  • Nishihara R, Lessard L, Recht B, Packard A, Jordan M (2015) A general analysis of the convergence of ADMM. Proc Mach Learn Res 37:343–352

    Google Scholar 

  • Ouyang Y, Chen Y, Lan G, Pasiliao E Jr (2015) An accelerated linearized alternating direction method of multipliers. SIAM J Imaging Sci 8(1):644–681

    Article  MathSciNet  MATH  Google Scholar 

  • Patterson MA, Rao AV (2014) GPOPS-II: a MATLAB software for solving multiple-phase optimal control problems using \(h_p\)-adaptive Gaussian quadrature collocation methods and sparse nonlinear programming. ACM Trans Math Softw (TOMS) 41(1):1–37

    Article  MATH  Google Scholar 

  • Pulay P (1980) Convergence acceleration of iterative sequences. The case of SCF iteration. Chem Phys Lett 73(2):393–398

    Article  Google Scholar 

  • Rawlings JB, Mayne DQ, Diehl MM (2017) Model predictive control: theory, computation, and design, 2nd edn. Nob Hill Publishing, Madison

    Google Scholar 

  • Rockafellar RT, Wets RJB (1998) Variational analysis. Springer, Berlin

    Book  MATH  Google Scholar 

  • Scattolini R (2009) Architectures for distributed and hierarchical model predictive control—a review. J Process Control 19(5):723–731

    Article  Google Scholar 

  • Scutari G, Facchinei F, Lampariello L (2016) Parallel and distributed methods for constrained nonconvex optimization—part I: theory. IEEE Trans Signal Process 65(8):1929–1944

    Article  MathSciNet  MATH  Google Scholar 

  • Stewart BT, Venkat AN, Rawlings JB, Wright SJ, Pannocchia G (2010) Cooperative distributed model predictive control. Syst Control Lett 59(8):460–469

    Article  MathSciNet  MATH  Google Scholar 

  • Sun K, Sun XA (2019) A two-level distributed algorithm for general constrained non-convex optimization with global convergence. arXiv preprint arXiv:190207654

  • Tang W, Allman A, Pourkargar DB, Daoutidis P (2018) Optimal decomposition for distributed optimization in nonlinear model predictive control through community detection. Comput Chem Eng 111:43–54

    Article  Google Scholar 

  • Themelis A, Patrinos P (2020) Douglas-Rachford splitting and ADMM for nonconvex optimization: tight convergence results. SIAM J Optim 30(1):149–181

    Article  MathSciNet  MATH  Google Scholar 

  • Toth A, Kelley C (2015) Convergence analysis for Anderson acceleration. SIAM J Numer Anal 53(2):805–819

    Article  MathSciNet  MATH  Google Scholar 

  • Wächter A, Biegler LT (2005) Line search filter methods for nonlinear programming: motivation and global convergence. SIAM J Optim 16(1):1–31

    Article  MathSciNet  MATH  Google Scholar 

  • Wächter A, Biegler LT (2006) On the implementation of an interior-point filter line-search algorithm for large-scale nonlinear programming. Math Prog 106(1):25–57

    Article  MathSciNet  MATH  Google Scholar 

  • Wang Y, Boyd S (2009) Fast model predictive control using online optimization. IEEE Trans Control Syst Technol 18(2):267–278

    Article  Google Scholar 

  • Wang Z, Ong CJ (2017) Distributed model predictive control of linear discrete-time systems with local and global constraints. Automatica 81:184–195

    Article  MathSciNet  MATH  Google Scholar 

  • Wang Y, Yin W, Zeng J (2019) Global convergence of ADMM in nonconvex nonsmooth optimization. J Sci Comput 78(1):29–63

    Article  MathSciNet  MATH  Google Scholar 

  • Xie J, Liao A, Yang X (2017) An inexact alternating direction method of multipliers with relative error criteria. Optim Lett 11(3):583–596

    Article  MathSciNet  MATH  Google Scholar 

  • Yang Y, Hu G, Spanos CJ (2020) A proximal linearization-based fecentralized method for nonconvex problems with nonlinear constraints. arXiv preprint arXiv:200100767

  • Zhang RY, White JK (2018) GMRES-accelerated ADMM for quadratic objectives. SIAM J Optim 28(4):3025–3056

    Article  MathSciNet  MATH  Google Scholar 

  • Zhang J, O’Donoghue B, Boyd S (2018) Globally convergent type-I Anderson acceleration for non-smooth fixed-point iterations. arXiv preprint arXiv:180803971

  • Zhang J, Peng Y, Ouyang W, Deng B (2019) Accelerating ADMM for efficient simulation and optimization. ACM Trans Graph 38(6):163

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported by National Science Foundation (NSF-CBET). The authors would also like to thank Prof. Qi Zhang for his constructive opinions.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Prodromos Daoutidis.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix 1: Proof of Lemma 1

We first prove that

$$\begin{aligned} \begin{aligned}&L\left( x^{k,r+1}, {\bar{x}}^{k,r+1}, z^{k,r+1}, y^{k,r+1}\right) \le L\left( x^{k,r}, {\bar{x}}^{k,r}, z^{k,r}, y^{k,r}\right) \\&\quad - \beta ^k\left\| B{\bar{x}}^{k,r+1} - B{\bar{x}}^{k,r}\right\| ^2 - \frac{\beta ^k}{2}\left\| z^{k,r+1} - z^{k,r}\right\| ^2 \end{aligned} \end{aligned}$$
(62)

for \(r=0, 1, 2, \dots \). First, since \(x^{k,r+1}\) is chosen as the minimizer of the augmented Lagrangian with respect to x (Line 9, Algorithm 1), the update of x leads to a decrease in L:

$$\begin{aligned} L\left( x^{k,r+1}, {\bar{x}}^{k,r}, z^{k,r}, y^{k,r}\right) \le L\left( x^{k,r}, {\bar{x}}^{k,r}, z^{k,r}, y^{k,r}\right) . \end{aligned}$$
(63)

Then consider the decrease resulted from \({\bar{x}}\)-update:

$$\begin{aligned} \begin{aligned}&L\left( x^{k,r+1}, {\bar{x}}^{k,r+1}, z^{k,r}, y^{k,r}\right) - L\left( x^{k,r+1}, {\bar{x}}^{k,r}, z^{k,r}, y^{k,r}\right) \\&\quad = g\left( {\bar{x}}^{k,r+1}\right) - g\left( {\bar{x}}^{k,r}\right) + y^{k,r\top }\left( B{\bar{x}}^{k,r+1} - B{\bar{x}}^{k,r}\right) \\&\qquad + \frac{\rho ^k}{2} \left\| Ax^{k,r+1} + B{\bar{x}}^{k,r+1} + z^{k,r}\right\| ^2 - \frac{\rho ^k}{2} \left\| Ax^{k,r+1} + B{\bar{x}}^{k,r} + z^{k,r}\right\| ^2 \\&\quad = g\left( {\bar{x}}^{k,r+1}\right) - g\left( {\bar{x}}^{k,r}\right) - \frac{\rho ^k}{2}\left\| B{\bar{x}}^{k,r+1}-B{\bar{x}}^{k,r}\right\| ^2 \\&\qquad -\rho ^k \left( {\bar{x}}^{k,r} - {\bar{x}}^{k,r+1}\right) ^\top B^\top \left( Ax^{k,r+1} + B{\bar{x}}^{k,r+1} + z^{k,r} + \frac{y^{k,r}}{\rho ^k} \right) . \end{aligned} \end{aligned}$$
(64)

The minimization of \({\bar{x}}\) (Line 10, Algorithm 1) should satisfy the optimality condition

$$\begin{aligned} 0 \in \rho ^k B^\top \left( Ax^{k,r+1} + B{\bar{x}}^{k,r+1} + z^{k,r} + \frac{y^{k,r}}{\rho ^k} \right) + \partial g\left( {\bar{x}}^{k,r+1}\right) + {\mathcal {N}}_{\bar{{\mathcal {X}}}} \left( {\bar{x}}^{k,r+1}\right) , \end{aligned}$$
(65)

i.e., there exist vectors \(v_1\in \partial g\left( {\bar{x}}^{k,r+1}\right) \) and \(v_2 \in {\mathcal {N}}_{\bar{{\mathcal {X}}}}\left( {\bar{x}}^{k,r+1}\right) \) with

$$\begin{aligned} \rho ^k B^\top \left( Ax^{k,r+1} + B{\bar{x}}^{k,r+1} + z^{k,r} + \frac{y^{k,r}}{\rho ^k} \right) = -v_1 - v_2. \end{aligned}$$
(66)

Since \(v_1\in \partial g\left( {\bar{x}}^{k,r+1}\right) \) and g is convex, \(v_1^\top \left( {\bar{x}}^{k,r} - {\bar{x}}^{k,r+1}\right) \le g\left( {\bar{x}}^{k,r}\right) - g\left( {\bar{x}}^{k,r+1}\right) \). And \(v_2 \in {\mathcal {N}}_{\bar{{\mathcal {X}}}} \left( {\bar{x}}^{k,r+1}\right) \) implies \(v_2^\top \left( {\bar{x}}^{k,r} - {\bar{x}}^{k,r+1}\right) \le 0\). Hence

$$\begin{aligned} \begin{aligned}&\rho ^k \left( {\bar{x}}^{k,r} - {\bar{x}}^{k,r+1}\right) ^\top B^\top \left( Ax^{k,r+1} + B{\bar{x}}^{k,r+1} + z^{k,r} + \frac{y^{k,r}}{\rho ^k} \right) \\&\quad = -v_1^\top \left( {\bar{x}}^{k,r} - {\bar{x}}^{k,r+1}\right) - v_2^\top \left( {\bar{x}}^{k,r} - {\bar{x}}^{k,r+1}\right) \\&\quad \ge -\left( g\left( {\bar{x}}^{k,r}\right) - g\left( {\bar{x}}^{k,r+1}\right) \right) . \end{aligned} \end{aligned}$$
(67)

Substituting the above inequality in (64), we obtain

$$\begin{aligned}&L\left( x^{k,r+1}, {\bar{x}}^{k,r+1}, z^{k,r}, y^{k,r}\right) \le L\left( x^{k,r+1}, {\bar{x}}^{k,r}, z^{k,r}, y^{k,r}\right) \nonumber \\&\quad -\frac{\rho ^k}{2} \left\| B{\bar{x}}^{k,r+1}-B{\bar{x}}^{k,r}\right\| ^2. \end{aligned}$$
(68)

Third, we consider the decrease resulted from z- and y-updates:

$$\begin{aligned} \begin{aligned}&L\left( x^{k,r+1}, {\bar{x}}^{k,r+1}, z^{k,r+1}, y^{k,r+1}\right) - L\left( x^{k,r+1}, {\bar{x}}^{k,r+1}, z^{k,r}, y^{k,r}\right) \\&\quad = \lambda ^{k\top } \left( z^{k,r+1}-z^{k,r} \right) + \frac{\beta ^k}{2} \left( \left\| z^{k,r+1}\right\| ^2 - \left\| z^{k,r}\right\| ^2 \right) \\&\qquad + y^{k,r+1\top } \left( Ax^{k,r+1} + B{\bar{x}}^{k,r+1} + z^{k,r+1}\right) \\&\qquad - y^{k,r\top } \left( Ax^{k,r+1} + B{\bar{x}}^{k,r+1} + z^{k,r}\right) \\&\qquad + \frac{\rho ^k}{2} \left\| Ax^{k,r+1} + B{\bar{x}}^{k,r+1} + z^{k,r+1}\right\| ^2 \\&\qquad - \frac{\rho ^k}{2} \left\| Ax^{k,r+1} + B{\bar{x}}^{k,r+1} + z^{k,r}\right\| ^2. \end{aligned} \end{aligned}$$
(69)

Since \(\upsilon (z; \lambda , \beta )=\lambda ^\top z + \frac{\beta }{2}\Vert z\Vert ^2\) is a convex function, whose gradient is \(\nabla \upsilon (z; \lambda , \beta ) = \lambda + \beta z\),

$$\begin{aligned} \upsilon \left( z^{k,r+1}; \lambda ^k, \beta ^k \right) - \upsilon \left( z^{k,r}; \lambda ^k, \beta ^k \right) \le \left( \lambda ^k + \beta ^k z^{k,r+1} \right) ^\top \left( z^{k,r+1}-z^{k,r} \right) , \end{aligned}$$
(70)

From Line 11 of Algorithm 1 it can be obtained

$$\begin{aligned} \lambda ^k + \beta z^{k,r+1} = -y^{k,r+1}. \end{aligned}$$
(71)

Substituting into (69), we obtain

$$\begin{aligned} \begin{aligned}&L\left( x^{k,r+1}, {\bar{x}}^{k,r+1}, z^{k,r+1}, y^{k,r+1}\right) - L\left( x^{k,r+1}, {\bar{x}}^{k,r+1}, z^{k,r}, y^{k,r}\right) \\&\quad \le \left( y^{k,r+1}- y^{k,r}\right) ^\top \left( Ax^{k,r+1} + B{\bar{x}}^{k,r+1} + z^{k,r}\right) \\&\qquad + \frac{\rho ^k}{2} \left\| Ax^{k,r+1} + B{\bar{x}}^{k,r+1} + z^{k,r+1}\right\| ^2 - \frac{\rho ^k}{2} \left\| Ax^{k,r+1} + B{\bar{x}}^{k,r+1} + z^{k,r}\right\| ^2 \\&\quad = \frac{\rho ^k}{2} \left( Ax^{k,r+1} + B{\bar{x}}^{k,r+1} + z^{k,r+1}\right) ^\top \left( Ax^{k,r+1} + B{\bar{x}}^{k,r+1} + z^{k,r}\right) \\&\qquad + \frac{\rho ^k}{2} \left\| Ax^{k,r+1} + B{\bar{x}}^{k,r+1} + z^{k,r+1}\right\| ^2 - \frac{\rho ^k}{2} \left\| Ax^{k,r+1} + B{\bar{x}}^{k,r+1} + z^{k,r}\right\| ^2 \\&\quad = -\frac{\rho ^k}{2} \left\| z^{k,r+1}-z^{k,r}\right\| ^2 + \rho ^k \left\| Ax^{k,r+1} + B{\bar{x}}^{k,r+1} + z^{k,r+1}\right\| ^2 \end{aligned} \end{aligned}$$
(72)

From (71),

$$\begin{aligned} Ax^{k,r+1} + B{\bar{x}}^{k,r+1} + z^{k,r+1} = \frac{1}{\rho _k} \left( y^{k,r+1} - y^{k,r}\right) = -\frac{\beta ^k}{\rho ^k} \left( z^{k,r+1}-z^{k,r}\right) . \end{aligned}$$
(73)

Then (72) becomes

$$\begin{aligned} \begin{aligned}&L\left( x^{k,r+1}, {\bar{x}}^{k,r+1}, z^{k,r+1}, y^{k,r+1}\right) - L\left( x^{k,r+1}, {\bar{x}}^{k,r+1}, z^{k,r}, y^{k,r}\right) \\&\quad \le -\left( \frac{\rho ^k}{2}-\frac{\left( \beta ^k\right) ^2}{\rho ^k}\right) \left\| z^{k,r+1}-z^{k,r}\right\| ^2 = -\frac{\beta ^k}{2} \left\| z^{k,r+1}-z^{k,r}\right\| ^2. \end{aligned} \end{aligned}$$
(74)

Summing up the inequalities (63), (68) and (74), we have proved the inequality (62).

Next, we show that the augmented Lagrangian is lower bounded, and hence is convergent towards some \({\underline{L}}^k\in {\mathbb {R}}\). We note that \(\upsilon (z; \lambda , \beta )\) is a convex function of modulus \(\beta \), it can be easily verified that

$$\begin{aligned}&\upsilon \left( z^{k,r}; \lambda ^k, \beta ^k \right) + \left( \lambda ^k + \beta ^k z^{k,r}\right) ^\top \left( z^\prime - z^{k,r}\right) \nonumber \\&\quad + \frac{\rho ^k}{2} \left\| z^\prime - z^{k,r}\right\| ^2 \ge \upsilon \left( z^\prime ; \lambda ^k, \beta ^k\right) \end{aligned}$$
(75)

for any \(z^\prime \), i.e.,

$$\begin{aligned} \upsilon \left( z^{k,r}; \lambda ^k, \beta ^k\right) + y^{k,r\top } \left( z^{k,r} - z^\prime \right) \ge \upsilon \left( z^\prime ; \lambda ^k, \beta ^k\right) - \frac{\rho ^k}{2}\left\| z^\prime - z^{k,r}\right\| ^2. \end{aligned}$$
(76)

Let \(z^\prime = -\left( Ax^{k,r} + B{\bar{x}}^{k,r}\right) \) and remove the last term on the right-hand side. Then

$$\begin{aligned}&\upsilon \left( z^{k,r}; \lambda ^k, \beta ^k\right) + y^{k,r\top } \left( Ax^{k,r} + B{\bar{x}}^{k,r} + z^{k,r}\right) \nonumber \\&\quad \ge \upsilon \left( - \left( Ax^{k,r} + B{\bar{x}}^{k,r}\right) ; \lambda ^k, \beta ^k\right) . \end{aligned}$$
(77)

Hence

$$\begin{aligned} \begin{aligned}&L\left( x^{k,r}, {\bar{x}}^{k,r+1}, z^{k,r}, y^{k,r}\right) \\&\quad = f\left( x^{k,r}\right) + g\left( {\bar{x}}^{k,r}\right) + \upsilon \left( z^{k,r}; \lambda ^k, \beta ^k\right) \\&\qquad + y^{k,r\top } \left( Ax^{k,r} + B{\bar{x}}^{k,r} + z^{k,r}\right) + \frac{\rho ^k}{2} \left\| Ax^{k,r} + B{\bar{x}}^{k,r} + z^{k,r}\right\| ^2 \\&\quad \ge f\left( x^{k,r}\right) + g\left( {\bar{x}}^{k,r}\right) + \upsilon \left( -\left( Ax^{k,r} + B{\bar{x}}^{k,r}\right) ; \lambda ^k, \beta ^k\right) . \\ \end{aligned} \end{aligned}$$
(78)

Since \(\upsilon (z) = \lambda ^\top z+\frac{\beta }{2}\Vert z\Vert ^2 \ge -\Vert \lambda \Vert ^2/(2\beta )\), \(\lambda \) is bounded in \(\left[ {\underline{\lambda }}, {\overline{\lambda }} \right] \), \(\beta ^k\ge \beta ^1\), and f and g are bounded below, L has a lower bound.

Taking the limit \(r\rightarrow \infty \) on the both sides of inequality (62), it becomes obvious that \(B{\bar{x}}^{k,r+1}-B{\bar{x}}^{k,r}\) and \(z^{k,r+1}-z^{k,r}\) converge to 0. Due to (73), we have \(Ax^{k,r} + B{\bar{x}}^{k,r} + z^{k,r} \rightarrow 0\). Hence there must exist a r such that (22) is met. At this time, the optimality conditions for \(x^{k,r+1}\) is written as

$$\begin{aligned} 0 \in \partial f\left( x^{k,r+1}\right) + {\mathcal {N}}_{\mathcal {X}}\left( x^{k,r+1}\right) + A^\top y^{k,r} + \rho ^kA^\top \left( Ax^{k,r+1}+B{\bar{x}}^{k,r}+z^{k,r} \right) . \end{aligned}$$
(79)

According to the update rule of \(y^{k,r}\), the above expression is equivalent to

$$\begin{aligned} 0\in & {} \partial f\left( x^{k,r+1}\right) + {\mathcal {N}}_{\mathcal {X}} \left( x^{k,r+1}\right) + A^\top y^{k,r+1} -\rho ^k A^\top \nonumber \\&\left( B{\bar{x}}^{k,r+1}+z^{k,r+1}-B{\bar{x}}^{k,r}-z^{k,r}\right) , \end{aligned}$$
(80)

i.e.,

$$\begin{aligned}&\rho ^kA^\top \left( B{\bar{x}}^{k,r+1}+z^{k,r+1}-B{\bar{x}}^{k,r}-z^{k,r}\right) \in \partial f\left( x^{k,r+1}\right) \nonumber \\&\quad +\,{\mathcal {N}}_{\mathcal {X}}\left( x^{k,r+1}\right) + A^\top y^{k,r+1}. \end{aligned}$$
(81)

According to the first inequality of (22), the norm of the left hand side above is not larger than \(\epsilon _1^k\), which directly implies the first condition in (23). In a similar manner, the second condition in (23) can be established. The third one follows from (71) and the fourth condition is obvious.

Appendix 2: Proof of Lemma 2

We first consider the situation when \(\beta ^k\) is unbounded. From (78), we have

$$\begin{aligned} {\overline{L}} \ge f\left( x^{k+1}\right) + g\left( x^{k+1}\right) - \lambda ^{k\top }\left( Ax^{k+1} + B{\bar{x}}^{k+1}\right) + \frac{\beta ^k}{2}\left\| Ax^{k+1} + B{\bar{x}}^{k+1}\right\| ^2. \end{aligned}$$
(82)

Since f and g are both lower bounded, as \(\beta ^k\rightarrow \infty \), we have \(Ax^{k+1} + B{\bar{x}}^{k+1} \rightarrow 0\). Combined with the first two conditions of (23) in the limit of \(\epsilon _1^k\), \(\epsilon _2^k\), \(\epsilon _3^k \downarrow 0\), we have reached (25).

Then we suppose that \(\beta ^k\) is bounded, i.e., the amplification step \(\beta ^{k+1}=\gamma \beta ^k\) is executed for only a finite number of outer iterations. According to Lines 17–21 of Algorithm 1, expect for some finite choices of k, \(\left\| z^{k+1}\right\| \le \omega \left\| z^k\right\| \) always hold. Therefore \(z^{k+1}\rightarrow 0\). Apparently, (25) follows from the limit of (23).

Appendix 3: Proof of Lemma 3

From Lemma 1 one knows that within R inner iterations

$$\begin{aligned} \frac{{\overline{L}} - {\underline{L}}^k}{\beta ^k} \ge \sum _{r=1}^{R} \left( \left\| B{\bar{x}}^{k,r+1} - B{\bar{x}}^{k,r} \right\| ^2 + \frac{1}{2} \left\| {\bar{z}}^{k,r+1} - z^{k,r} \right\| ^2 \right) . \end{aligned}$$
(83)

Then

$$\begin{aligned} \left\| B{\bar{x}}^{k,R+1} - B{\bar{x}}^{k,R} \right\| , \left\| z^{k,R+1} - z^{k,R} \right\| \sim {\mathcal {O}}\left( 1/\sqrt{\beta ^k R}\right) . \end{aligned}$$
(84)

For the kth outer iteration, its inner iterations are terminated when (22) is met, which is translated into the following relations:

$$\begin{aligned} \begin{aligned}&{\mathcal {O}}\left( \rho ^k/\sqrt{\beta ^k R^k} \right) \le \epsilon _1^k, \epsilon _2^k \sim {\mathcal {O}}\left( \vartheta ^k \right) , \\&{\mathcal {O}}\left( 1/\sqrt{\beta ^k R^k} \right) \le \epsilon _3^k \sim {\mathcal {O}}\left( \vartheta ^k/\beta ^k \right) . \end{aligned} \end{aligned}$$
(85)

where the last relation uses (73) with \(\rho ^k=2\beta ^k\). Therefore

$$\begin{aligned} R^k \sim {\mathcal {O}}\left( \beta ^k/\vartheta ^{2k} \right) . \end{aligned}$$
(86)

At the end of the kth iteration, suppose that Lines 19–20 and Lines 17–18 of Algorithm 1 have been executed for \(k_1\) and \(k_2\) times, respectively (\(k_1+k_2=k\)). Then the obtained \(z^{k+1}\) satisfies \(\left\| z^{k+1}\right\| \sim {\mathcal {O}}\left( \omega ^{k_1}\right) \), and \(\left\| Ax^{k+1}+B{\bar{x}}^{k+1}+z^{k+1}\right\| \le \epsilon _3^k \sim {\mathcal {O}}\left( \vartheta ^k/\beta ^k\right) \), which imply

$$\begin{aligned} \left\| Ax^{k+1}+B{\bar{x}}^{k+1}\right\| \le {\mathcal {O}}\left( \vartheta ^k/\beta ^k\right) + {\mathcal {O}}\left( \omega ^{k_1}\right) . \end{aligned}$$
(87)

From (82),

$$\begin{aligned} \beta ^k \left\| Ax^{k+1}+B{\bar{x}}^{k+1}\right\| ^2 \sim \beta ^k \left( {\mathcal {O}} \left( \vartheta ^k/\beta ^k\right) + {\mathcal {O}}\left( \omega ^{k_1}\right) \right) ^2 \sim {\mathcal {O}}(1). \end{aligned}$$
(88)

Substituting (88) into (86), we obtain

$$\begin{aligned} R^k \sim {\mathcal {O}}\left( \frac{1}{\vartheta ^{2k}} \frac{1}{\left( {\mathcal {O}}\left( \vartheta ^k/\beta ^k\right) + {\mathcal {O}}\left( \omega ^{k_1}\right) \right) ^2} \right) . \end{aligned}$$
(89)

When \(\vartheta \le \omega \), \(\vartheta ^k \le \omega ^{k} \le \omega ^{k_1}\gamma ^{k_2}\), and hence \(\gamma ^{k_2}\vartheta ^k \le \omega ^{k_1}\), i.e., \(\omega ^{k_1}\) dominates over \(\vartheta ^k/\beta ^k\), leading to

$$\begin{aligned} R^k \sim {\mathcal {O}}\left( 1/\vartheta ^{2k}\omega ^{2k_1} \right) \sim {\mathcal {O}}\left( 1/\vartheta ^{2k}\omega ^{2k} \right) . \end{aligned}$$
(90)

For K outer iterations, the total number of inner iterations is

$$\begin{aligned} R = \sum _{k=1}^K R^k \sim {\mathcal {O}}\left( \sum _{k=1}^K \frac{1}{\vartheta ^{2k}\omega ^{2k}} \right) \sim {\mathcal {O}}\left( \frac{1}{\vartheta ^{2K}\omega ^{2K}} \right) . \end{aligned}$$
(91)

The number of outer iterations needed to reach an \(\epsilon \)-approximate stationary point is obviously \(K\sim {\mathcal {O}}\left( \log _\vartheta \epsilon \right) \). Then

$$\begin{aligned} R \sim {\mathcal {O}}\left( \epsilon ^{-2(1+\varsigma )} \right) . \end{aligned}$$
(92)

Appendix 4: Proof of Lemma 6

Through the inner iterations, only Anderson acceleration might lead to an increase in the barrier augmented Lagrangian. Combining Assumptions 3, 5, and the safeguarding criterion (41), we obtain

$$\begin{aligned} L_{b^k}\left( x^{k,r+1}, {\bar{x}}^{k,r+1}, z^{k,r+1}, y^{k,r+1}\right) \le {\overline{L}} + {\tilde{L}}_0\eta _L\sum _{r=0}^{\infty } \frac{1}{r^{1+\sigma }} < +\infty , \end{aligned}$$
(93)

Together with Assumptions 1 and 2, \(L_{b^k}\) is also bounded below. Therefore \(L_{b^k}\) is bounded in a closed interval and must have converging subsequences. Therefore we can choose a subsequence converging to the lower limit \({\underline{L}}\). For any \(\varepsilon >0\) there exists an index R of inner iteration in this subsequence, such that \({\tilde{L}}_0\eta _L \sum _{r=R}^\infty r^{-(1+\sigma )} < \varepsilon /2\) and \(L_{b^k}\left( x^{k,r+1},{\bar{x}}^{k,r+1},z^{k,r+1},y^{k,r+1}\right) < {\underline{L}}+\varepsilon /2\) for any \(r\ge R\) on this subsequence. It then follows that for any \(r\ge R\), whether on the subsequence or not, it holds that

$$\begin{aligned} L_{b^k}\left( x^{k,r+1},{\bar{x}}^{k,r+1},z^{k,r+1},y^{k,r+1}\right) < {\underline{L}}+\varepsilon . \end{aligned}$$
(94)

Hence the upper limit is not larger than \({\underline{L}}+\varepsilon \). Due to the arbitrariness of \(\varepsilon >0\), the lower limit coincides with the upper limit, and hence the sequence of barrier augmented Lagrangian is convergent.

The convergence of the barrier augmented Lagrangian implies that as \(r \rightarrow \infty \),

$$\begin{aligned} L_{b^k}\left( x^{k,r+1},{\bar{x}}^{k,r+1},z^{k,r+1},y^{k,r+1}\right) - L_{b^k}\left( x^{k,r},{\bar{x}}^{k,r},z^{k,r},y^{k,r}\right) \rightarrow 0. \end{aligned}$$
(95)

Suppose that r is not an accelerated iteration, then since this quantity does not exceed \(-\beta ^k\left\| B{\bar{x}}^{k,r+1} - B{\bar{x}}^{k,r}\right\| ^2 - \left( \beta ^k/2\right) \left\| z^{k,r+1}-z^{k,r}\right\| ^2\), we must have \(B{\bar{x}}^{k,r+1} - B{\bar{x}}^{k,r} \rightarrow 0\) and \(z^{k,r+1}-z^{k,r}\rightarrow 0\). Otherwise if inner iteration r is accelerated, the convergence of \(B{\bar{x}}^{k,r+1} - B{\bar{x}}^{k,r}\) and \(z^{k,r+1}-z^{k,r}\) are automatically guaranteed by the second criterion (42) of accepting Anderson acceleration. The convergence properties of these two sequences naturally fall into the paradigm of Lemma 1 for establishing the convergence to approximate KKT conditions of the relaxed problem.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tang, W., Daoutidis, P. Fast and stable nonconvex constrained distributed optimization: the ELLADA algorithm. Optim Eng 23, 259–301 (2022). https://doi.org/10.1007/s11081-020-09585-w

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11081-020-09585-w

Keywords

Navigation