Abstract
This paper studies the minimization of a broad class of nonsmooth nonconvex objective functions subject to nonlinear functional equality constraints, where the gradients of the differentiable parts in the objective and the constraints are only locally Lipschitz continuous. We propose a specific proximal linearized alternating direction method of multipliers in which the proximal parameter is generated dynamically, and we design an explicit and tractable backtracking procedure to generate it. We prove subsequent convergence of the method to a critical point of the problem, and global convergence when the problem’s data are semialgebraic. These results are obtained with no dependency on the explicit manner in which the proximal parameter is generated. As a byproduct of our analysis, we also obtain global convergence guarantees for the proximal gradient method with a dynamic proximal parameter under local Lipschitz continuity of the gradient of the smooth part of the nonlinear sum composite minimization model.
Similar content being viewed by others
Notes
It should be mentioned that our general model (M) could formally be reformulated through this more general model. However, by doing so, the specific data information and structure of our model (M) above will be lost and cannot anymore be beneficially exploited.
References
Beck, A., Teboulle, M.: Gradient-based algorithms with applications to signal recovery problems. In: Palomar, D., Eldar, Y.C. (eds.) Convex Optimization in Signal Processing and Communications, pp. 139–162. Cambridge University Press, Cambridge (2009)
Bertsekas, D.P.: Nonlinear Programming. Athena Scientific, Belmont, MA (1999)
Bertsekas, D.P.: Constrained Optimization and Lagrange Multiplier Methods. Athena Scientific, Belmont, MA (1996)
Bertsekas, D.P.: Convex Optimization Algorithms. Athena Scientific, Belmont, MA (2015)
Bertsekas, D.P., Tsitsiklis, J.N.: Parallel and Distributed Computation: Numerical Methods. Athena Scientific (2003)
Bolte, J., Daniilidis, A., Lewis, A.: The Łojasiewicz inequality for nonsmooth subanalytic functions with applications to subgradient dynamical systems. SIAM J. Opt. 17(4), 1205–1223 (2007)
Bolte, J., Sabach, S., Teboulle, M.: Proximal alternating linearized minimization for nonconvex and nonsmooth problems. Math. Program. 146(1–2), 459–494 (2014)
Bolte, J., Sabach, S., Teboulle, M., Vaisbourd, Y.: First order methods beyond convexity and Lipschitz gradient continuity with applications to quadratic inverse problems. SIAM J. Opt. 28, 2131–2151 (2018)
Bolte, J., Sabach, S., Teboulle, M.: Nonconvex Lagrangian-based optimization: monitoring schemes and global convergence. Math. Op. Res. 43(4), 1210–1232 (2018)
Boţ, R.I., Nguyen, D.K.: The proximal alternating direction method of multipliers in the nonconvex setting: convergence analysis and rates. Math. Op. Res. 45(2), 682–712 (2020)
Boyd, S., Parikh, N., Chu, E.: Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers. Now Publishers Inc (2011)
Clarke, F.H.: Optimization and Nonsmooth Analysis. SIAM (1990)
Cobzaş, Ş, Miculescu, R., Nicolae, A.: Lipschitz Functions, vol. 2241. Springer (2019)
Fan, J., Li, R.: Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc. 96(456), 1348–1360 (2001)
Fessler, J.A.: Optimization methods for magnetic resonance image reconstruction: key models and optimization algorithms. IEEE Signal Process. Mag. 37(1), 33–40 (2020)
Gabay, G., Mercier, B.: A dual algorithm for the solution of nonlinear variational problems via finite element approximation. Comput. Math. Appl. 2(1), 17–40 (1976)
Glowinski, R., Le Tallec, P.: Augmented Lagrangian and Operator-Splitting Methods in Nonlinear Mechanics, vol. 9. SIAM (1989)
Hestenes, M.R.: Multiplier and gradient methods. J. Opt. Theory Appl. 4(5), 303–320 (1969)
Li, G., Pong, T.K.: Global convergence of splitting methods for nonconvex composite optimization. SIAM J. Opt. 25(4), 2434–2460 (2015)
Luke, D.R., Sabach, S., Teboulle, M.: Optimization on spheres: models and proximal algorithms with computational performance comparisons. SIAM J. Math. Data Sci. 1(3), 408–445 (2019)
Mei, S., Bai, Y., Montanari, A.: The landscape of empirical risk for nonconvex losses. Annal. Stat. 46(6A), 2747–2774 (2018)
Mordukhovich, B.S.: Variational Analysis and Generalized Differentiation, I: Basic Theory, II: Applications. Springer, Berlin (2006)
Mordukhovich, B.S.: Variational Analysis and Applications. Springer, Cham, Switzerland (2018)
Powell, M.J.D.: A method for nonlinear constraints in minimization problems. In: Fletcher, R. (ed.) Optimization, pp. 283–298. Academic Press, New York, NY (1969)
Rockafellar, R.T.: Augmented Lagrange multiplier functions and duality in nonconvex programming. SIAM J. Control 12(2), 268–285 (1974)
Rockafellar, R.T., Wets, J.B.R.: Variational Analysis. Springer, Berlin (2004)
Royset, J. O.: Variational Analysis in Modern Statistics. Special Issue Mathematical Programming, Series B, Volume 174 (2019)
Sabach, S., Teboulle, M.: Lagrangian methods for composite optimization. In Handbook of Numerical Analyis. Edited by Ron Kimmel, Xue-Cheng Tai, Volume 20, 401–436. Elsevier (2019)
Shefi, R., Teboulle, M.: Rate of convergence analysis of decomposition methods based on the proximal method of multipliers for convex minimization. SIAM J. Opt. 24(1), 269–297 (2014)
Teboulle, M.: A simplified view of first order methods for optimization. Math. Program. 170(1), 67–96 (2018)
Von Stackelberg, H.: Market Structure and Equilibrium. Springer Science & Business Media, Berlin (2010)
Acknowledgements
The research of E. Cohen was supported by a Ph.D fellowship under ISF Grants 1844-16 and 2619-20 and DFG grant 800240. The research of M. Teboulle was partially supported by the Israel Science Foundation, under ISF Grants 1844-16 and 2619-20, and by the German Research Foundation under DFG Grant 800240.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Boris S. Mordukhovich.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix A: Subdifferential Calculus and the Lagrangian
Model (M) is nonconvex and nonsmooth in general, and so we turn to subdifferential calculus for nonconvex functions. Specifically, we recall the definition of subdifferentials, critical points, and a few related results. For a more detailed presentation, see [22, 23, 26].
Definition A.1
(Subdifferentials) Let \({\mathbb {E}}\) be an Euclidean vector space, \(\psi : {\mathbb {E}} \rightarrow (-\infty , \infty ]\) be a proper and lsc function, and \(z\in {{\,\mathrm{dom}\,}}\psi \).
-
(i)
The Fréchet (regular) subdifferential of \(\psi \) at z, denoted by \({\hat{\partial }}\psi (z)\), is the set of all vectors \(v\in {\mathbb {E}}\) satisfying
$$\begin{aligned} \psi (x)\ge \psi (z)+\langle v,x-z\rangle + o\left( \left\| {x-z} \right\| \right) \text {.} \end{aligned}$$(A.1) -
(ii)
The Mordukhovich (limiting) subdifferential of \(\psi \) at z, denoted by \(\partial \psi (z)\), is the set of all vectors \(v\in {\mathbb {E}}\) such that there exist sequences \(\{z^{k}\}_{k\in {\mathbb {N}}}\) and \(\{v^{k}\}_{k\in {\mathbb {N}}}\) where \(z^{k}\rightarrow z\), \(\psi (z^{k})\rightarrow \psi (z)\), \(v^{k}\in {\hat{\partial }}\psi (z^{k})\), and \(v^{k}\rightarrow v\).
-
(iii)
The horizon subdifferential of \(\psi \) at z, denoted by \(\partial ^{\infty }\psi {z}\), has a similar definition to that in (ii), but instead of \(v^{k}\rightarrow v\) we have \(t^{k}v^{k}\rightarrow v\) for some real sequence \(t^{k}\searrow 0\).
For \(z\notin {{\,\mathrm{dom}\,}}\psi \) we set \({\hat{\partial }}\psi (z)=\partial \psi (z)=\partial ^{\infty }\psi (z)=\emptyset \).
Note that (A.1) is equivalent to
Remark A.1
(Closedness of graph of \(\partial \psi \)) We note, as in [7, Remark 1(ii)], that given a convergent sequence \((x^{k}, w^{k})\xrightarrow [k\rightarrow \infty ]{}(x, w)\), such that \(w^{k}\in \partial \psi (x^{k})\) and \(\lim _{k\rightarrow \infty }\psi (x^{k})=\psi (x)\), it holds that \(w\in \partial \psi (x)\).
Proposition A.1
(see Exercise 8.8(c), p.304 in [26]) Let \(\psi :{\mathbb {R}}^{d}\rightarrow (-\infty , \infty ]\) be an extended valued function and \(\phi :{\mathbb {R}}^{d}\rightarrow {\mathbb {R}}\) be a smooth function. Then,
where for \(x\notin {{\,\mathrm{dom}\,}}\psi \), we note that \(\emptyset +\nabla \phi (x)=\emptyset \).
Definition A.2
(Critical points) Let \({\mathbb {E}}\) be an Euclidean vector space, \(\psi : {\mathbb {E}} \rightarrow (-\infty , \infty ]\) be a proper and lsc function. Then, the set of critical points of \(\psi \) is defined by
Appendix B: Lipschitz and Local Lipschitz Continuity
We review the notion of Lipschitz and local Lipschitz continuity in the context of model (M). Thus, we restrict our discussion to Euclidean vector spaces and in the following \({\mathbb {X}}\) and \({\mathbb {Y}}\) denote such spaces.
We begin with the basic definition.
Definition B.1
(Lipschitz and locally Lipschitz continuity) Let \(S\subseteq {\mathbb {X}}\) be a nonempty set and \(\phi :S\rightarrow {\mathbb {Y}}\) be a continuous mapping over S. Then,
-
(i)
\(\phi \) is L-Lipschitz continuous over S with \(L\ge 0\) if the following holds
$$\begin{aligned} \left\| {\phi (x) - \phi (z)} \right\| \le L\left\| {x-z} \right\| ,\quad \forall x,z\in S. \end{aligned}$$(B.1) -
(ii)
\(\phi \) is locally Lipschitz continuous over S if for every \(z\in S\) there exist \(\varepsilon (z)>0\), \(L_{\varepsilon }(z)\ge 0\), and a neighborhood \({\mathcal {N}}_{\varepsilon }(y){:=}\{x\in S: \left\| {x-z} \right\| <\varepsilon (z)\}\), such that \(\phi \) is \(L_{\varepsilon }(z)\)-Lipschitz continuous over \({\mathcal {N}}_{\varepsilon }(z)\), i.e.,
$$\begin{aligned} \left\| {\phi (x) - \phi (z)} \right\| \le L_{\varepsilon }(z)\left\| {x-z} \right\| ,\quad \forall x,z\in {\mathcal {N}}_{\varepsilon }(z). \end{aligned}$$(B.2)
When (i) or (ii) hold with \(S\equiv {\mathbb {X}}\) then \(\phi \) is referred to as L-Lipschitz continuous or locally Lipschitz continuous, respectively. In addition, note that when \(\phi (x)-\phi (z)\) is a matrix then \(\left\| {\phi (x)-\phi (z)} \right\| \) is the spectral norm.
We also recall the following characterization of locally Lipschitz continuous mappings.
Proposition B.1
(Local Lipschitz continuity and compact sets (Theorem 2.1.6 in [13])) Let \(S\subseteq {\mathbb {X}}\) be a nonempty set. Then, a mapping \(\phi :{\mathbb {X}}\rightarrow {\mathbb {Y}}\) is locally Lipschitz continuous over S if and only if for every nonempty and compact set \(C\subseteq S\) there exists \(L_{C}\ge 0\) such that \(\phi \) is \(L_{C}\)-Lipschitz continuous over C, i.e.,
We conclude with a proposition which deals with Lipschitz continuity properties of \(C^{1}\) mappings.
Proposition B.2
(Differential mappings and Lipschitz continuity) Let \(\phi :{\mathbb {X}}\rightarrow {\mathbb {Y}}\) be a \(C^{1}\) mapping. Then the following claims hold.
-
(i)
\(\phi \) is locally Lipschitz continuous;
-
(ii)
Let \(B\subseteq {\mathbb {X}}\) be a closed ball, i.e., \(B=\{x:\left\| {x-z} \right\| \le r\}\), for some \(z\in {\mathbb {X}}\) and \(r\in (0, \infty ]\), and assume that \(\phi \) is \(L_{B}\)-Lipschitz continuous over B, with \(L_{B}\ge 0\). Then,
$$\begin{aligned} \left\| {\nabla \phi (x)} \right\| \le L_{B},\;\forall x\in B. \end{aligned}$$
Proof
For the first claim, see, e.g., [12, Corollary, p. 32]. The second claim can be easily obtained using the definition of the Gâteaux derivative, see [12, p. 30], and the continuity of \(\nabla \phi \). \(\square \)
Appendix C: Proof of Lemma 4.1
Proof of Lemma 4.1
We prove that \(\nabla \varphi \) is locally Lipschitz continuous by utilizing the local Lipschitz continuity characterization stated in Proposition B.1, i.e., given a nonempty and compact set \(C\subset {\mathbb {R}}^{n}\times {\mathbb {R}}^{m}\times {\mathbb {R}}^{q}\) we prove that there exists \(L_{C}\in {\mathbb {R}}_{+}\) such that for every \(\omega =(u,v,y), {\hat{\omega }}=({\hat{u}}, {\hat{v}}, {\hat{y}})\in C\), we have
With \(\nabla \varphi (\omega )=\left( \nabla _{u}\varphi (\omega ), \nabla _{v}\varphi (\omega ), \nabla _{y}\varphi (\omega )\right) \), we intend to prove that there exist \(L_{C,u}\), \(L_{C,v}\), and \(L_{C,y}\in {\mathbb {R}}_{+}\), such that
and as
we can set \(L_{C} = L_{C,u}+L_{C,v}+L_{C,y}\).
We begin with \(\nabla _{u}\varphi \) (cf. (4.1)) and note that for every \((u,v,y),({\hat{u}},{\hat{v}},{\hat{y}})\in {\mathbb {R}}^{n}\times {\mathbb {R}}^{m}\times {\mathbb {R}}^{q}\), we have
where the inequality is due to the triangle inequality.
Next, let \(C_{u}\) be the projection of C on \({\mathbb {R}}^{n}\) and \(B_{u}\) be a compact ball such that \(C_{u}\subseteq B_{u}\). Then, as \(\nabla g\) is locally Lipschitz continuous and \(B_{u}\) is compact, we can apply Proposition B.1 and obtain that there exists \(L_{g}\ge 0\) such that
Next, recall that F is \(C^{1}\) and hence, by Proposition B.2(i), F is locally Lipschitz continuous. In addition, \(\nabla F\) is also locally Lipschitz continuous. Thus, there exist \(l_{F} \ge 0\) and \(L_{F} \ge 0\), such that for every \(u,{\hat{u}}\in B_{u}\), we have
where the first two inequalities are due to Proposition B.1 and the last is due to Proposition B.2(ii). Applying the above inequalities, we obtain that, for every \(u,{\hat{u}}\in B_{u}\), \(v,{\hat{v}}\in {\mathbb {R}}^{m}\), and \(y,{\hat{y}}\in {\mathbb {R}}^{q}\), we have
Combining the results above, and noting that \(C\subseteq B_{u}\times {\mathbb {R}}^{m}\times {\mathbb {R}}^{q}\), we obtain that, for every \(\omega =(u,v,y), {\hat{\omega }}=({\hat{u}},{\hat{v}},{\hat{y}})\in C\), we have
with \(L_{\varphi }(u,v,y){:=}\left( L_{g}+L_{F}\left\| {y+\rho (F(u)-Gv)} \right\| + l_{F}(1 + \rho l_{F} + \rho \left\| {G} \right\| ) \right) \left\| {{\hat{\omega }} -\omega } \right\| .\) We complete the proof for \(\nabla _{u}\varphi \) and obtain inequality (C.2) by setting \(L_{C,u}=\sup _{(u,v,y)\in C}L_{\varphi }(u,v,y)\) and noting that \(L_{C,u}<\infty \) as \(L_{\varphi }\) is continuous and C is compact.
Next, we examine \(\nabla _{v}\varphi \) (cf. (4.2)). For every \(\omega , {\hat{\omega }}\in C\), we have
with \(L_{C,v} = \left\| {G} \right\| \left( 1 + \rho (l_{F} + \left\| {G} \right\| )\right) \).
Finally, with \(\nabla _{y}\varphi \) (cf. (4.3)), for every \(\omega , {\hat{\omega }}\in C\), we have
with \(L_{C,y} = l_{F} + \left\| {G} \right\| \). \(\square \)
Rights and permissions
About this article
Cite this article
Cohen, E., Hallak, N. & Teboulle, M. A Dynamic Alternating Direction of Multipliers for Nonconvex Minimization with Nonlinear Functional Equality Constraints. J Optim Theory Appl 193, 324–353 (2022). https://doi.org/10.1007/s10957-021-01929-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10957-021-01929-5
Keywords
- Augmented Lagrangian-based methods
- Nonconvex and nonsmooth minimization
- Proximal gradient method
- Kurdyka-Lojasiewicz property
- Global convergence