Abstract
Distributed optimization using multiple computing agents in a localized and coordinated manner is a promising approach for solving large-scale optimization problems, e.g., those arising in model predictive control (MPC) of large-scale plants. However, a distributed optimization algorithm that is computationally efficient, globally convergent, amenable to nonconvex constraints remains an open problem. In this paper, we combine three important modifications to the classical alternating direction method of multipliers for distributed optimization. Specifically, (1) an extra-layer architecture is adopted to accommodate nonconvexity and handle inequality constraints, (2) equality-constrained nonlinear programming (NLP) problems are allowed to be solved approximately, and (3) a modified Anderson acceleration is employed for reducing the number of iterations. Theoretical convergence of the proposed algorithm, named ELLADA, is established and its numerical performance is demonstrated on a large-scale NLP benchmark problem. Its application to distributed nonlinear MPC is also described and illustrated through a benchmark process system.
Similar content being viewed by others
Notes
We use the word “oracle” with its typical meaning in mathematics and computer science. An oracle refers to an ad hoc numerical or computational procedure, regarded as a black box mechanism, to generate the needed results as its outputs based on some input information.
References
Anderson DG (1965) Iterative procedures for nonlinear integral equations. J ACM 12(4):547–560
Bertsekas DP (2016) Nonlinear programming, 3rd edn. Athena Scientific, Nashua
Biegler LT, Thierry DM (2018) Large-scale optimization formulations and strategies for nonlinear model predictive control. IFAC-PapersOnLine 51(20):1–15, the 6th IFAC Conference on Nonlinear Model Predictive Control (NMPC)
Boyd S, Parikh N, Chu E, Peleato B, Eckstein J (2011) Distributed optimization and statistical learning via the alternating direction method of multipliers. Found Trend Mach Learn 3(1):1–122
Chen X, Heidarinejad M, Liu J, Christofides PD (2012) Distributed economic MPC: application to a nonlinear chemical process network. J Process Control 22(4):689–699
Chen C, He B, Ye Y, Yuan X (2016) The direct extension of ADMM for multi-block convex minimization problems is not necessarily convergent. Math Prog 155(1–2):57–79
Christofides PD, Scattolini R, Muñoz de la Peña D, Liu J (2013) Distributed model predictive control: a tutorial review and future research directions. Comput Chem Eng 51:21–41
Daoutidis P, Tang W, Jogwar SS (2018) Decomposing complex plants for distributed control: perspectives from network theory. Comput Chem Eng 114:43–51
Daoutidis P, Tang W, Allman A (2019) Decomposition of control and optimization problems by network structure: concepts, methods and inspirations from biology. AIChE J 65(10):e16708
Dhingra NK, Khong SZ, Jovanović MR (2019) The proximal augmented Lagrangian method for nonsmooth composite optimization. IEEE Trans Autom Control 64(7):2861–2868
Dolan ED, Moré JJ (2002) Benchmarking optimization software with performance profiles. Math Prog 91(2):201–213
Eckstein J, Bertsekas DP (1992) On the Douglas-Rachford splitting method and the proximal point algorithm for maximal monotone operators. Math Prog 55(1–3):293–318
Eckstein J, Yao W (2017) Approximate ADMM algorithms derived from Lagrangian splitting. Comput Optim Appl 68(2):363–405
Eckstein J, Yao W (2018) Relative-error approximate versions of Douglas-Rachford splitting and special cases of the ADMM. Math Prog 170(2):417–444
Fang Hr, Saad Y (2009) Two classes of multisecant methods for nonlinear acceleration. Numer Linear Algebra Appl 16(3):197–221
Farokhi F, Shames I, Johansson KH (2014) Distributed MPC via dual decomposition and alternative direction method of multipliers. In: Distributed model predictive control made easy. Springer, Berlin, pp 115–131
Fu A, Zhang J, Boyd S (2019) Anderson accelerated Douglas–Rachford splitting. arXiv preprint arXiv:190811482
Gabay D, Mercier B (1976) A dual algorithm for the solution of nonlinear variational problems via finite element approximation. Comput Math Appl 2(1):17–40
Giselsson P, Doan MD, Keviczky T, De Schutter B, Rantzer A (2013) Accelerated gradient methods and dual decomposition in distributed model predictive control. Automatica 49(3):829–833
Glowinski R, Marroco A (1975) Sur l’approximation, par éléments finis d’ordre un, et la résolution, par pénalisation-dualité d’une classe de problèmes de dirichlet non linéaires. Rev Fr Autom Inform Rech Opér, Anal Numér 9(R2):41–76
Goldstein T, O’Donoghue B, Setzer S, Baraniuk R (2014) Fast alternating direction optimization methods. SIAM J Imaging Sci 7(3):1588–1623
Hajinezhad D, Hong M (2019) Perturbed proximal primal–dual algorithm for nonconvex nonsmooth optimization. Math Prog 176(1–2):207–245
He B, Yuan X (2012) On the \(O(1/n)\) convergence rate of the Douglas–Rachford alternating direction method. SIAM J Numer Anal 50(2):700–709
Hong M, Luo ZQ (2017) On the linear convergence of the alternating direction method of multipliers. Math Prog 162(1–2):165–199
Hong M, Luo ZQ, Razaviyayn M (2016) Convergence analysis of alternating direction method of multipliers for a family of nonconvex problems. SIAM J Optim 26(1):337–364
Hours JH, Jones CN (2015) A parametric nonconvex decomposition algorithm for real-time and distributed NMPC. IEEE Trans Autom Control 61(2):287–302
Houska B, Frasch J, Diehl M (2016) An augmented Lagrangian based algorithm for distributed nonconvex optimization. SIAM J Optim 26(2):1101–1127
Jalving J, Cao Y, Zavala VM (2019) Graph-based modeling and simulation of complex systems. Comput Chem Eng 125:134–154
Jiang B, Lin T, Ma S, Zhang S (2019) Structured nonconvex and nonsmooth optimization: algorithms and iteration complexity analysis. Comput Optim Appl 72(1):115–157
Johansson KH (2000) The quadruple-tank process: a multivariable laboratory process with an adjustable zero. IEEE Trans Control Syst Technol 8(3):456–465
Li G, Pong TK (2015) Global convergence of splitting methods for nonconvex composite optimization. SIAM J Optim 25(4):2434–2460
Liu J, Chen X, Muñoz de la Peña D, Christofides PD (2010) Sequential and iterative architectures for distributed model predictive control of nonlinear process systems. AIChE J 56(8):2137–2149
Mota JF, Xavier JM, Aguiar PM, Püschel M (2014) Distributed optimization with local domains: applications in MPC and network flows. IEEE Trans Autom Control 60(7):2004–2009
Nesterov YuE (1983) A method of solving a convex programming problem with convergence rate \(O(\frac{1}{k^2})\). Dokl Akad Nauk SSSR 269(3):543–547
Nicholson B, Siirola JD, Watson JP, Zavala VM, Biegler LT (2018) pyomo.dae: a modeling and automatic discretization framework for optimization with differential and algebraic equations. Math Prog Comput 10(2):187–223
Nishihara R, Lessard L, Recht B, Packard A, Jordan M (2015) A general analysis of the convergence of ADMM. Proc Mach Learn Res 37:343–352
Ouyang Y, Chen Y, Lan G, Pasiliao E Jr (2015) An accelerated linearized alternating direction method of multipliers. SIAM J Imaging Sci 8(1):644–681
Patterson MA, Rao AV (2014) GPOPS-II: a MATLAB software for solving multiple-phase optimal control problems using \(h_p\)-adaptive Gaussian quadrature collocation methods and sparse nonlinear programming. ACM Trans Math Softw (TOMS) 41(1):1–37
Pulay P (1980) Convergence acceleration of iterative sequences. The case of SCF iteration. Chem Phys Lett 73(2):393–398
Rawlings JB, Mayne DQ, Diehl MM (2017) Model predictive control: theory, computation, and design, 2nd edn. Nob Hill Publishing, Madison
Rockafellar RT, Wets RJB (1998) Variational analysis. Springer, Berlin
Scattolini R (2009) Architectures for distributed and hierarchical model predictive control—a review. J Process Control 19(5):723–731
Scutari G, Facchinei F, Lampariello L (2016) Parallel and distributed methods for constrained nonconvex optimization—part I: theory. IEEE Trans Signal Process 65(8):1929–1944
Stewart BT, Venkat AN, Rawlings JB, Wright SJ, Pannocchia G (2010) Cooperative distributed model predictive control. Syst Control Lett 59(8):460–469
Sun K, Sun XA (2019) A two-level distributed algorithm for general constrained non-convex optimization with global convergence. arXiv preprint arXiv:190207654
Tang W, Allman A, Pourkargar DB, Daoutidis P (2018) Optimal decomposition for distributed optimization in nonlinear model predictive control through community detection. Comput Chem Eng 111:43–54
Themelis A, Patrinos P (2020) Douglas-Rachford splitting and ADMM for nonconvex optimization: tight convergence results. SIAM J Optim 30(1):149–181
Toth A, Kelley C (2015) Convergence analysis for Anderson acceleration. SIAM J Numer Anal 53(2):805–819
Wächter A, Biegler LT (2005) Line search filter methods for nonlinear programming: motivation and global convergence. SIAM J Optim 16(1):1–31
Wächter A, Biegler LT (2006) On the implementation of an interior-point filter line-search algorithm for large-scale nonlinear programming. Math Prog 106(1):25–57
Wang Y, Boyd S (2009) Fast model predictive control using online optimization. IEEE Trans Control Syst Technol 18(2):267–278
Wang Z, Ong CJ (2017) Distributed model predictive control of linear discrete-time systems with local and global constraints. Automatica 81:184–195
Wang Y, Yin W, Zeng J (2019) Global convergence of ADMM in nonconvex nonsmooth optimization. J Sci Comput 78(1):29–63
Xie J, Liao A, Yang X (2017) An inexact alternating direction method of multipliers with relative error criteria. Optim Lett 11(3):583–596
Yang Y, Hu G, Spanos CJ (2020) A proximal linearization-based fecentralized method for nonconvex problems with nonlinear constraints. arXiv preprint arXiv:200100767
Zhang RY, White JK (2018) GMRES-accelerated ADMM for quadratic objectives. SIAM J Optim 28(4):3025–3056
Zhang J, O’Donoghue B, Boyd S (2018) Globally convergent type-I Anderson acceleration for non-smooth fixed-point iterations. arXiv preprint arXiv:180803971
Zhang J, Peng Y, Ouyang W, Deng B (2019) Accelerating ADMM for efficient simulation and optimization. ACM Trans Graph 38(6):163
Acknowledgements
This work was supported by National Science Foundation (NSF-CBET). The authors would also like to thank Prof. Qi Zhang for his constructive opinions.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix 1: Proof of Lemma 1
We first prove that
for \(r=0, 1, 2, \dots \). First, since \(x^{k,r+1}\) is chosen as the minimizer of the augmented Lagrangian with respect to x (Line 9, Algorithm 1), the update of x leads to a decrease in L:
Then consider the decrease resulted from \({\bar{x}}\)-update:
The minimization of \({\bar{x}}\) (Line 10, Algorithm 1) should satisfy the optimality condition
i.e., there exist vectors \(v_1\in \partial g\left( {\bar{x}}^{k,r+1}\right) \) and \(v_2 \in {\mathcal {N}}_{\bar{{\mathcal {X}}}}\left( {\bar{x}}^{k,r+1}\right) \) with
Since \(v_1\in \partial g\left( {\bar{x}}^{k,r+1}\right) \) and g is convex, \(v_1^\top \left( {\bar{x}}^{k,r} - {\bar{x}}^{k,r+1}\right) \le g\left( {\bar{x}}^{k,r}\right) - g\left( {\bar{x}}^{k,r+1}\right) \). And \(v_2 \in {\mathcal {N}}_{\bar{{\mathcal {X}}}} \left( {\bar{x}}^{k,r+1}\right) \) implies \(v_2^\top \left( {\bar{x}}^{k,r} - {\bar{x}}^{k,r+1}\right) \le 0\). Hence
Substituting the above inequality in (64), we obtain
Third, we consider the decrease resulted from z- and y-updates:
Since \(\upsilon (z; \lambda , \beta )=\lambda ^\top z + \frac{\beta }{2}\Vert z\Vert ^2\) is a convex function, whose gradient is \(\nabla \upsilon (z; \lambda , \beta ) = \lambda + \beta z\),
From Line 11 of Algorithm 1 it can be obtained
Substituting into (69), we obtain
From (71),
Then (72) becomes
Summing up the inequalities (63), (68) and (74), we have proved the inequality (62).
Next, we show that the augmented Lagrangian is lower bounded, and hence is convergent towards some \({\underline{L}}^k\in {\mathbb {R}}\). We note that \(\upsilon (z; \lambda , \beta )\) is a convex function of modulus \(\beta \), it can be easily verified that
for any \(z^\prime \), i.e.,
Let \(z^\prime = -\left( Ax^{k,r} + B{\bar{x}}^{k,r}\right) \) and remove the last term on the right-hand side. Then
Hence
Since \(\upsilon (z) = \lambda ^\top z+\frac{\beta }{2}\Vert z\Vert ^2 \ge -\Vert \lambda \Vert ^2/(2\beta )\), \(\lambda \) is bounded in \(\left[ {\underline{\lambda }}, {\overline{\lambda }} \right] \), \(\beta ^k\ge \beta ^1\), and f and g are bounded below, L has a lower bound.
Taking the limit \(r\rightarrow \infty \) on the both sides of inequality (62), it becomes obvious that \(B{\bar{x}}^{k,r+1}-B{\bar{x}}^{k,r}\) and \(z^{k,r+1}-z^{k,r}\) converge to 0. Due to (73), we have \(Ax^{k,r} + B{\bar{x}}^{k,r} + z^{k,r} \rightarrow 0\). Hence there must exist a r such that (22) is met. At this time, the optimality conditions for \(x^{k,r+1}\) is written as
According to the update rule of \(y^{k,r}\), the above expression is equivalent to
i.e.,
According to the first inequality of (22), the norm of the left hand side above is not larger than \(\epsilon _1^k\), which directly implies the first condition in (23). In a similar manner, the second condition in (23) can be established. The third one follows from (71) and the fourth condition is obvious.
Appendix 2: Proof of Lemma 2
We first consider the situation when \(\beta ^k\) is unbounded. From (78), we have
Since f and g are both lower bounded, as \(\beta ^k\rightarrow \infty \), we have \(Ax^{k+1} + B{\bar{x}}^{k+1} \rightarrow 0\). Combined with the first two conditions of (23) in the limit of \(\epsilon _1^k\), \(\epsilon _2^k\), \(\epsilon _3^k \downarrow 0\), we have reached (25).
Then we suppose that \(\beta ^k\) is bounded, i.e., the amplification step \(\beta ^{k+1}=\gamma \beta ^k\) is executed for only a finite number of outer iterations. According to Lines 17–21 of Algorithm 1, expect for some finite choices of k, \(\left\| z^{k+1}\right\| \le \omega \left\| z^k\right\| \) always hold. Therefore \(z^{k+1}\rightarrow 0\). Apparently, (25) follows from the limit of (23).
Appendix 3: Proof of Lemma 3
From Lemma 1 one knows that within R inner iterations
Then
For the kth outer iteration, its inner iterations are terminated when (22) is met, which is translated into the following relations:
where the last relation uses (73) with \(\rho ^k=2\beta ^k\). Therefore
At the end of the kth iteration, suppose that Lines 19–20 and Lines 17–18 of Algorithm 1 have been executed for \(k_1\) and \(k_2\) times, respectively (\(k_1+k_2=k\)). Then the obtained \(z^{k+1}\) satisfies \(\left\| z^{k+1}\right\| \sim {\mathcal {O}}\left( \omega ^{k_1}\right) \), and \(\left\| Ax^{k+1}+B{\bar{x}}^{k+1}+z^{k+1}\right\| \le \epsilon _3^k \sim {\mathcal {O}}\left( \vartheta ^k/\beta ^k\right) \), which imply
From (82),
Substituting (88) into (86), we obtain
When \(\vartheta \le \omega \), \(\vartheta ^k \le \omega ^{k} \le \omega ^{k_1}\gamma ^{k_2}\), and hence \(\gamma ^{k_2}\vartheta ^k \le \omega ^{k_1}\), i.e., \(\omega ^{k_1}\) dominates over \(\vartheta ^k/\beta ^k\), leading to
For K outer iterations, the total number of inner iterations is
The number of outer iterations needed to reach an \(\epsilon \)-approximate stationary point is obviously \(K\sim {\mathcal {O}}\left( \log _\vartheta \epsilon \right) \). Then
Appendix 4: Proof of Lemma 6
Through the inner iterations, only Anderson acceleration might lead to an increase in the barrier augmented Lagrangian. Combining Assumptions 3, 5, and the safeguarding criterion (41), we obtain
Together with Assumptions 1 and 2, \(L_{b^k}\) is also bounded below. Therefore \(L_{b^k}\) is bounded in a closed interval and must have converging subsequences. Therefore we can choose a subsequence converging to the lower limit \({\underline{L}}\). For any \(\varepsilon >0\) there exists an index R of inner iteration in this subsequence, such that \({\tilde{L}}_0\eta _L \sum _{r=R}^\infty r^{-(1+\sigma )} < \varepsilon /2\) and \(L_{b^k}\left( x^{k,r+1},{\bar{x}}^{k,r+1},z^{k,r+1},y^{k,r+1}\right) < {\underline{L}}+\varepsilon /2\) for any \(r\ge R\) on this subsequence. It then follows that for any \(r\ge R\), whether on the subsequence or not, it holds that
Hence the upper limit is not larger than \({\underline{L}}+\varepsilon \). Due to the arbitrariness of \(\varepsilon >0\), the lower limit coincides with the upper limit, and hence the sequence of barrier augmented Lagrangian is convergent.
The convergence of the barrier augmented Lagrangian implies that as \(r \rightarrow \infty \),
Suppose that r is not an accelerated iteration, then since this quantity does not exceed \(-\beta ^k\left\| B{\bar{x}}^{k,r+1} - B{\bar{x}}^{k,r}\right\| ^2 - \left( \beta ^k/2\right) \left\| z^{k,r+1}-z^{k,r}\right\| ^2\), we must have \(B{\bar{x}}^{k,r+1} - B{\bar{x}}^{k,r} \rightarrow 0\) and \(z^{k,r+1}-z^{k,r}\rightarrow 0\). Otherwise if inner iteration r is accelerated, the convergence of \(B{\bar{x}}^{k,r+1} - B{\bar{x}}^{k,r}\) and \(z^{k,r+1}-z^{k,r}\) are automatically guaranteed by the second criterion (42) of accepting Anderson acceleration. The convergence properties of these two sequences naturally fall into the paradigm of Lemma 1 for establishing the convergence to approximate KKT conditions of the relaxed problem.
Rights and permissions
About this article
Cite this article
Tang, W., Daoutidis, P. Fast and stable nonconvex constrained distributed optimization: the ELLADA algorithm. Optim Eng 23, 259–301 (2022). https://doi.org/10.1007/s11081-020-09585-w
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11081-020-09585-w