Skip to main content
Log in

Splitting Methods with Variable Metric for Kurdyka–Łojasiewicz Functions and General Convergence Rates

  • Published:
Journal of Optimization Theory and Applications Aims and scope Submit manuscript

Abstract

We study the convergence of general descent methods applied to a lower semi-continuous and nonconvex function, which satisfies the Kurdyka–Łojasiewicz inequality in a Hilbert space. We prove that any precompact sequence converges to a critical point of the function, and obtain new convergence rates both for the values and the iterates. The analysis covers alternating versions of the forward–backward method with variable metric and relative errors. As an example, a nonsmooth and nonconvex version of the Levenberg–Marquardt algorithm is detailed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Notes

  1. A simple sufficient \(-\) yet not necessary \(-\) condition for \(\inf _{k\in \mathbb {N}} a_k b_{k+1}^2 >0\) and \(\inf _{k\in \mathbb {N}} a_k b_{k+1}>0\) is that \(\inf _{k\in \mathbb {N}}b_k>0\).

References

  1. Łojasiewicz, S.: Une propriété topologique des sous-ensembles analytiques réels. In: Les Équations aux Dérivées Partielles, pp. 87–89. Éditions du centre National de la Recherche Scientifique, Paris (1963)

  2. Kurdyka, K.: On gradients of functions definable in o-minimal structures. Ann. Inst. Fourier 48, 769–783 (1998)

    Article  MATH  MathSciNet  Google Scholar 

  3. Łojasiewicz, S.: Sur la géométrie semi- et sous-analytique. Ann. Inst. Fourier 43, 1575–1595 (1993)

    Article  MATH  Google Scholar 

  4. Kurdyka, K., Parusiński, A.: \(\mathbf{w}_f\)-stratification of subanalytic functions and the Łojasiewicz inequality. C. R. Acad. Paris 318, 129–133 (1994)

    MATH  Google Scholar 

  5. Bolte, J., Daniilidis, A., Lewis, A.: The Łojasiewicz inequality for nonsmooth subanalytic functions with applications to subgradient dynamical systems. SIAM J. Optim. 17, 1205–1223 (2006)

    Article  MathSciNet  Google Scholar 

  6. Simon, L.: Asymptotics for a class of nonlinear evolution equations, with applications to geometric problems. Ann. Math. 118, 525–571 (1983)

    Article  MATH  Google Scholar 

  7. Huang, S.-Z., Takáč, P.: Convergence in gradient-like systems which are asymptotically autonomous and analytic. Nonlinear Anal. Ser. A Theory Methods 46, 675–698 (2001)

    Article  MATH  Google Scholar 

  8. Chill, R., Jendoubi, M.A.: Convergence to steady states in asymptotically autonomous semilinear evolution equations. Nonlinear Anal. 53, 1017–1039 (2003)

    Article  MATH  MathSciNet  Google Scholar 

  9. Haraux, A., Jendoubi, M.A.: Convergence of solutions of second-order gradient-like systems with analytic nonlinearities. J. Differ. Equ. 144(2), 313–320 (1998)

    Article  MATH  MathSciNet  Google Scholar 

  10. Haraux, A.: A hyperbolic variant of Simon’s convergence theorem. Evolution equations and their applications in physical and life sciences (Bad Herrenalb, 1998). Lecture Notes in Pure and Applied Mathematics. Dekker, New York (2001)

    Google Scholar 

  11. Baudoin, L., Salomon, J.: Constructive solution of a bilinear optimal control problem for a Schrödinger equation. Syst. Control Lett. 57(6), 453–464 (2008)

    Article  Google Scholar 

  12. Absil, P.-A., Mahony, R., Andrews, B.: Convergence of the iterates of descent methods for analytic cost functions. SIAM J. Optim. 16, 531–547 (2005)

    Article  MATH  MathSciNet  Google Scholar 

  13. Attouch, H., Bolte, J.: On the convergence of the proximal algorithm for nonsmooth functions involving analytic features. Math. Program. Ser. B 116, 5–16 (2009)

    Article  MATH  MathSciNet  Google Scholar 

  14. Bolte, J., Daniilidis, A., Ley, O., Mazet, L.: Characterizations of Łojasiewicz inequalities: subgradient flows, talweg, convexity. Trans. Am. Math. Soc. 362, 3319–3363 (2010)

    Article  MATH  MathSciNet  Google Scholar 

  15. Merlet, B., Pierre, M.: Convergence to equilibrium for the backward Euler scheme and applications. Commun. Pure Appl. Anal 9, 685–702 (2010)

    Article  MATH  MathSciNet  Google Scholar 

  16. Noll, D.: Convergence of non-smooth descent methods using the Kurdyka–Lojasiewicz inequality. J. Optim. Theory Appl. 160, 553–572 (2014)

    Article  MATH  MathSciNet  Google Scholar 

  17. Attouch, H., Bolte, J., Svaiter, B.F.: Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward-backward splitting, and regularized Gauss–Seidel methods. Math. Program. 137(1–2), 91–129 (2013)

    Article  MATH  MathSciNet  Google Scholar 

  18. Chouzenoux, E., Pesquet, J.C., Repetti, A.: Variable metric forward-backward algorithm for minimizing the sum of a differentiable function and a convex function. J. Optim. Theory Appl . (to appear) (2013)

  19. Attouch, H., Bolte, J., Redont, P., Soubeyran, A.: Proximal alternating minimization and projection methods for nonconvex problems. An approach based on the Kurdyka–Łojasiewicz inequality. Math. Oper. Res. 35(2), 438–457 (2010)

    Article  MATH  MathSciNet  Google Scholar 

  20. Xu, Y., Yin, W.: A block coordinate descent method for regularized multiconvex optimization with applications to nonnegative tensor factorization and completion. SIAM J. Imaging Sci. 6(3), 1758–1789 (2013)

    Article  MATH  MathSciNet  Google Scholar 

  21. Bolte, J., Sabach, S., Teboulle, M.: Proximal alternating linearized minimization for nonconvex and nonsmooth problems. Math. Programm. 15, 1–36 (2013)

    Google Scholar 

  22. Chouzenoux, E., Pesquet, J.C., Repetti, A.: A block coordinate variable metric forward-backward algorithm (2014). http://www.optimization-online.org/DB_HTML/2013/12/4178.html

  23. Li, D., Pang, L.P., Chen, S.: A proximal alternating linearization method for nonconvex optimization problems. Optim. Methods Softw. 29(4), 646–657 (2014)

    Article  MathSciNet  Google Scholar 

  24. Bolte, J., Daniilidis, A., Lewis, A., Shiota, M.: Clarke subgradients of stratifiable functions. SIAM J. Optim. 18(2), 556–572 (2007)

    Article  MATH  MathSciNet  Google Scholar 

  25. van den Dries, L.: Tame topology and o-minimal structures. Bull. AMS 37(3), 351–357 (2000)

    Article  Google Scholar 

  26. van den Dries, L., Miller, C.: Geometric categories and o-minimal structures. Duke Math. J. 84, 497–540 (1996)

    Article  MATH  MathSciNet  Google Scholar 

  27. Haraux, A., Jendoubi, M.A.: The Łojasiewicz gradient inequality in the infinite dimensional Hilbert space framework. J. Funct. Anal. 260(9), 2826–2842 (2010)

    Article  MathSciNet  Google Scholar 

  28. Chill, R.: The Łojasiewicz–Simon gradient inequality in Hilbert spaces. In: Jendoubi, M.A. (ed.) Proceedings of the 5th European-Maghrebian Workshop on Semigroup Theory, Evolution Equations, and Applications, pp. 25–36 (2006)

  29. Attouch, H., Buttazo, G., Michaille, G.: Variational Analysis in Sobolev and \(BV\) Spaces. MPS-SIAM Series on Optimization. Springer, NewYork (2008)

    Google Scholar 

  30. Bertsekas, D.: Nonlinear Programming. Athena Scientific, Belmont (1999)

    MATH  Google Scholar 

  31. Rockafellar, R.T.: Monotone operators and the proximal point algorithm. SIAM J. Control Optim. 14(5), 877–898 (1976)

    Article  MATH  MathSciNet  Google Scholar 

  32. Ferris, M.: Finite termination of the proximal point algorithm. Math. Program. 50, 359–366 (1991)

    Article  MATH  MathSciNet  Google Scholar 

  33. Peypouquet, J.: Asymptotic convergence to the optimal value of diagonal proximal iterations in convex minimization. J. Convex Anal. 16(1), 277–286 (2009)

    MATH  MathSciNet  Google Scholar 

  34. Chill, R., Fiorenza, A.: Convergence and decay rate to equilibrium of bounded solutions of quasilinear parabolic equations. J. Differ. Equ. 228, 611–632 (2006)

    Article  MATH  MathSciNet  Google Scholar 

  35. Cauchy, A.-L.: Méthode générale pour la résolution des systèmes d’équations simultanées. C. R. Acad. Sci. Paris 25, 536–538 (1847)

    Google Scholar 

  36. Martinet, B.: Régularisation d’inéquations variationnelles par approximations successives. Rev. Française Informat. Recherche Opérationnelle 4, Sér. R-3, 154–158 (1970)

  37. Brézis, H., Lions, P.-L.: Produits infinis de résolvantes. Israel J. Math. 29(4), 329–345 (1978)

    Article  MATH  MathSciNet  Google Scholar 

  38. Lions, P.L., Mercier, B.: Splitting algorithms for the sum of two nonlinear operators. SIAM J. Numer. Anal. 16, 964–979 (1979)

    Article  MATH  MathSciNet  Google Scholar 

  39. Passty, G.: Ergodic convergence to a zero of the sum of monotone operators in Hilbert space. J. Math. Anal. Appl. 72(2), 383–390 (1979)

    Article  MATH  MathSciNet  Google Scholar 

  40. Alvarez, F., Bolte, J., Brahic, O.: Hessian Riemannian gradient flows in convex programming. SIAM J. Control Optim. 43(2), 477–501 (2004)

    Article  MATH  MathSciNet  Google Scholar 

  41. Alvarez, F., López, J., Ramírez, H.: Interior proximal algorithm with variable metric for second-order cone programming: applications to structural optimization and support vector machines. Optim. Methods Softw. 25(4–6), 859–881 (2010)

    Article  MATH  MathSciNet  Google Scholar 

  42. Attouch, H., Svaiter, B.F.: A continuous dynamical Newton-like approach to solving monotone inclusions. SIAM J. Optim. 49(2), 574–598 (2011)

    Article  MATH  MathSciNet  Google Scholar 

  43. Sra, S., Nowozin, S., Wright, S.J.: Optimization for Machine Learning. MIT Press, Cambridge (2011)

    Google Scholar 

  44. Gafni, E.M., Bertsekas, D.P.: Two-metric projection methods for constrained optimization. SIAM J. Control Optim. 22(6), 936–964 (1984)

    Article  MATH  MathSciNet  Google Scholar 

  45. Bertsekas, D.P.: Projected Newton methods for optimization problems with simple constraints. SIAM J. Control Optim. 20(2), 221–246 (1982)

    Article  MATH  MathSciNet  Google Scholar 

  46. Lewis, A.S., Luke, D.R., Malick, J.: Local linear convergence for alternating and averaged non convex projections. Found. Comput. Math. 9(4), 485–513 (2009)

    Article  MATH  MathSciNet  Google Scholar 

  47. Donoho, D.L.: For most Large underdetermined systems of linear equations the minimal \(\ell ^1\)-norm solution is also the sparsest solution. Commun. Pure Appl. Math. 6, 59, 797–829 (2006)

    Article  MathSciNet  Google Scholar 

  48. Donoho, D.L., Tanner, J.: Counting the faces of randomly-projected hypercubes and orthants, with applications. Discret. Comput. Geom. 43, 522–541 (2010)

    Article  MATH  MathSciNet  Google Scholar 

  49. Daubechies, I., Defrise, M., De Mol, C.: An iterative thresholding algorithm for linear inverse problems with a sparsity constraint. Commun. Pure Appl. Math. 57(11), 1413–1457 (2004)

    Article  MATH  Google Scholar 

  50. Chandrasekaran, V., Sanghavi, S., Parillo, P.A., Willsky, A.S.: Rank-sparsity incoherence for matrix decomposition. SIAM J. Optim. 21(2), 572–596 (2011)

    Article  MATH  MathSciNet  Google Scholar 

  51. Ganesh, A., Lin, Z., Wright, J., Wu, L., Chen, M., Ma, Y.: Fast algorithms for recovering a corrupted low-rank matrix. In: Proceedings of the 3rd IEEE International Workshop on Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP), pp. 213–216 (2009)

  52. Yuan, X., Yang, J.: Sparse and low-rank matrix decomposition via alternating direction method. Pac. J. Optim. 9(1), 167–180 (2013)

    MATH  MathSciNet  Google Scholar 

  53. Recht, B., Fazel, B., Parillo, P.A.: Guaranteed minimum rank solutions to linear matrix equations via nuclear norm minimization. SIAM Rev. 52, 471–501 (2007)

    Article  Google Scholar 

  54. Nesterov, Y.E.: A method for solving the convex programming problem with convergence rate O(\(1/k^2\)). Dokl. Akad. Nauk SSSR 269, 543–547 (1983)

    MathSciNet  Google Scholar 

  55. Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2, 183–202 (2009)

    Article  MATH  MathSciNet  Google Scholar 

  56. Beck, A., Teboulle, M.: Gradient-based algorithms with applications in signal recovery problems. In: Palomar, D., Eldar, Y. (eds.) Convex Optimization in Signal Processing and Communications, pp. 33–88. Cambridge University Press, Cambridge (2010)

    Google Scholar 

  57. Ochs, P., Chen, Y., Brox, T.: Pock, T.: iPiano: Inertial Proximal Algorithm for Non-convex Optimization (2014). http://lmb.informatik.uni-freiburg.de/Publications/2014/OB14/

  58. Casas, E., Herzog, R., Wachsmuth, G.: Approximation of sparse controls in semilinear equations by piecewise linear functions. Numer. Math. 122(4), 645–669 (2012)

    Article  MATH  MathSciNet  Google Scholar 

Download references

Acknowledgments

The authors thank H. Attouch for useful remarks. They would also like to thank the anonymous reviewer for his careful reading and constructive comments. The second and third authors are partly supported by Conicyt Anillo Project ACT-1106, ECOS-Conicyt Project C13E03 and Millenium Nucleus ICM/FIC P10-024F. The third author is also partly supported by FONDECYT Grant 1140829 and Basal Project CMM Universidad de Chile.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Guillaume Garrigos.

Additional information

Communicated by Hédy Attouch.

Appendix

Appendix

1.1 Proofs of Theorems 3.1 and 3.2

The argument is a straightforward adaptation of the ideas in the proof of [17, Lemma 2.6]. One first proves the following:

Lemma 7.1

Let \(\mathbf {H}_1\) and \(\mathbf {H}_2\) hold and fix \(k\in \mathbb {N}\). If \(x^k\) and \(x^{k+1}\) belong to \(\underline{\Gamma }_\eta (x^*,\delta )\), then

$$\begin{aligned} 2\Vert x^{k+1}-x^k\Vert&\le \Vert x^k-x^{k-1}\Vert \\&+\frac{1}{a_kb_k}\big [\varphi (f(x^k)-f(x^*)) -\varphi (f(x^{k+1})-f(x^*))\big ]+\epsilon _{k}.\nonumber \end{aligned}$$
(24)

For next results, we introduce the following auxiliary property (automatically fulfilled under the hypotheses of Theorems 3.1 and 3.2), which includes a stability of the sequence \((x^k)_{k\in \mathbb {N}}\) with respect to the point \(x^*\), along with a sufficiently close initialization.

\(\underline{\mathbf {S}(x^*,\delta ,\rho )}\): There exist \(\delta >\rho >0\) such that

  1. (i)

    For each \(k\in \mathbb {N}\), if \(\frac{}{}\!x^0,\dots ,x^k\in \underline{\Gamma }_\eta (x^*,\rho )\), then \(x^{k+1}\in \underline{\Gamma }_\eta (x^*,\delta )\);

  2. (ii)

    The initial point \(x^0\) belongs to \(\Gamma _\eta (x^*,\rho )\) and

    $$\begin{aligned} \Vert x^*-x^{0}\Vert +2\sqrt{\frac{f(x^{0})-f(x^*)}{a_0}}+ M\varphi (f(x^{0})-f(x^*))+\sum _{i=1}^{+\infty }\epsilon _i < \rho . \end{aligned}$$
    (25)

Then, we have the following estimation:

Lemma 7.2

Let \(\mathbf {H}_1\), \(\mathbf {H}_2\), \(\mathbf {H}_3\) and \(\mathbf {S}(x^*,\delta ,\rho )\) hold, and \(M:=\sup _{k\in \mathbb {N}^*}\frac{1}{a_kb_k}<+\infty \). Then, for all \(K\in \mathbb {N}^*\), we have \(x^K\in \underline{\Gamma }_\eta (x^*,\rho )\) and

$$\begin{aligned}&\sum _{k=1}^{K}\Vert x^{k+1}-x^k\Vert +\Vert x^{K+1}-x^K\Vert \\&\quad \le \Vert x^1-x^0\Vert +M\big [\varphi (f(x^1)-f( x^*))-\varphi (f(x^{K+1})-f(x^*))\big ]+\sum _{k=1}^{K}\epsilon _k. \end{aligned}$$

The basic asymptotic properties are given by the following result:

Proposition 7.1

Let \(\mathbf {H}_1\), \(\mathbf {H}_2\), \(\mathbf {H}_3\) and \(\mathbf {S}(x^*,\delta ,\rho )\) hold. Then, \(x^k\in \underline{\Gamma }_\eta (x^*,\rho )\) for all \(k\), the sequence has finite length and converges to some \(\overline{x}\). Moreover, \(\liminf \limits _{k\rightarrow \infty }\Vert \partial f(x^k) \Vert _- =0\), and \(f(\overline{x})\le \lim \limits _{k\rightarrow \infty }f(x^k)=f(x^*)\).

Proof

Capture, convergence and finite length follow from Lemma 7.2 and \(\mathbf {H}_3\). Next, since \((b_k)\notin \ell ^1\) and \(\sum _{k=1}^\infty b_{k+1}\Vert \partial f(x^k) \Vert _- \le \sum _{k=1}^\infty \Vert x^{k+1}-x^k\Vert +\sum _{k=1}^\infty \varepsilon _{k+1}<\infty \), we obtain \(\liminf _{k\rightarrow \infty }\Vert \partial f(x^k) \Vert _- =0\). Finally, observe that that \(\lim _{k\rightarrow \infty }f(x^k)\) exists because \(f(x^k)\) is decreasing, and bounded from below by \(f(x^*)\); and the lower semi-continuity of \(f\) implies \(f(\overline{x})\le \lim _{k\rightarrow \infty }f(x^k)\). If \(\lim _{k\rightarrow \infty }f(x^k)=\beta >f(x^*)\), the KŁ inequality and the fact that \(\varphi '\) is decreasing imply \(\varphi '(\beta -f(x^*))\Vert \partial f(x^k) \Vert _- \ge \varphi '(f(x^k)-f(x^*))\Vert \partial f(x^k) \Vert _- \ge 1\) for all \(k\in \mathbb {N}\), which is impossible because \(\liminf _{k\rightarrow \infty }\Vert \partial f(x^k) \Vert _- =0\), whence \(\beta = f(x^*)\). \(\square \)

We are now in position to complete the proofs of Theorems 3.1 and 3.2.

Proof of Theorem 3.1

Let \(x^{n_k}\rightarrow x^*\) with \(f(x^{n_k})\rightarrow f(x^*)\) as \(k\rightarrow \infty \). Since \(f(x^k)\) is nonincreasing and admits a limit point, we deduce that \(f(x^k)\downarrow f(x^*)\). In particular, we have \(f(x^*)\le f(x^k)\) for all \(k\in \mathbb {N}\). The function \(f\) satisfies the KŁ inequality on \(\Gamma _\eta (x^*,\delta )\) with desingularising function \(\varphi \). Let \(K_0\in \mathbb {N}\) be sufficiently large so that \(f(x^{K})-f(x^*)<\min \{\eta ,\underline{a}\delta ^2\}\), and pick \(\rho >0\) such that \(f(x^K)-f(x^*)<\underline{a}(\delta -\rho )^2\).

Hence, \(f(x^*)\le f(x^{k+1})<f(x^*)+\eta \) for all \(k\ge K\) and

$$\begin{aligned} \Vert x^{k+1}-x^k\Vert \le \sqrt{\frac{f(x^k)-f(x^{k+1})}{a_k}}\le \sqrt{\frac{f(x^K)-f(x^*)}{\underline{a}}}< \delta -\rho , \end{aligned}$$

which implies part (i) of \(\mathbf {S}(x^*,\delta ,\rho )\). Now take \(K\ge K_0\) such that

$$\begin{aligned} \Vert x^*-x^{K}\Vert +2\sqrt{\frac{f(x^{K})-f(x^*)}{a_{n_K}}}+M\varphi (f(x^{K})-f(x^*))+\sum _{k=K+1}^{+\infty }\epsilon _{k}<\rho . \end{aligned}$$

The sequence \((y^k)_{k\in \mathbb {N}}\), defined by \(y^k:=x^{K+k}\) for all \(k\in \mathbb {N}\), satisfies the hypotheses of Proposition 7.1. Finally, since the whole sequence \((y^k)_{k\in \mathbb {N}}\) is \(f\)-convergent toward \(x^*\) and \(\liminf \limits _{k\rightarrow \infty }\Vert \partial f(y^k) \Vert _-=0\), we conclude that \(x^*\) must be critical using Lemma 2.1. \(\square \)

Proof of Theorem 3.2

Since \(f\) has the KŁ property in \(x^*\), there is a strict local upper level set \(\Gamma _{\eta }(x^*,\delta )\), where the KŁ inequality holds with \(\varphi \) as a desingularising function. Take \(\rho <\frac{3}{4}\delta \) and then \( \gamma <\frac{1}{3}\rho \). If necessary, shrink \(\eta \) so that \(2\sqrt{\dfrac{\eta }{\underline{a}}} + M \varphi (\eta ) < \dfrac{2\rho }{3}\). This is possible since \(\varphi \) is continuous in \(0\) with \(\varphi (0)=0\). Let \(x^0\in \underline{\Gamma }_\eta (x^*,\gamma )\subset \underline{\Gamma }_\eta (x^*,\rho )\). It suffices to verify that \(\mathbf {S}(x^*,\delta ,\rho )\) is fulfilled and use Proposition 7.1. For (i), let us suppose that \(x^0,\dots ,x^k\) lie in \(\underline{\Gamma }_\eta (x^*,\rho )\) and prove that \(x^{k+1} \in \underline{\Gamma }_\eta (x^*,\delta )\). Since \(x^*\) is a global minimum, from \(\mathbf {H}_1\) and the fact that \(\left( f(x^k)\right) _{k\in \mathbb {N}}\) is decreasing, we have

$$\begin{aligned} f(x^*)\!+\!\underline{a} \Vert x^{k+1}\!-\!x^k \Vert ^2\!\le \! f(x^{k+1})\!+\!\underline{a} \Vert x^{k+1}-x^k \Vert ^2\le f(x^k)\!\le \! f(x^0)< f(x^*) + \eta . \end{aligned}$$

It follows that \(\Vert x^{k+1}-x^*\Vert \le \Vert x^{k+1}-x^k\Vert +\Vert x^k-x^*\Vert <\sqrt{\frac{\eta }{\underline{a}}}+\rho <\frac{4}{3}\rho <\delta \), and so \(x^{k+1}\in \underline{\Gamma }_\eta (x^*,\delta )\). Finally,

$$\begin{aligned} \Vert x^0-x^*\Vert \!+\!2\sqrt{\frac{f(x^0)\!-\!f(x^*)}{a_0}}\!+\!M\varphi (f(x^0)-f(x^*))<\frac{1}{3}\rho +2\sqrt{\dfrac{\eta }{\underline{a}}} + M \varphi (\eta ) \!<\!\rho , \end{aligned}$$

which is precisely (ii). \(\square \)

1.2 Proof of Proposition 4.1

Proof

Since \(X_i^k=Y_i^k +S_i^k\), we can rewrite the algorithm as

$$\begin{aligned} y_i^{k+1} \in \text{ prox }^{A_{i,k}}_{g_i} (y_i^k - A_{i,k}^{-1} \nabla _i h(Y_i^k + S_i^k) + r_i^k + s_i^k). \end{aligned}$$
(26)

We start by showing that \(\mathbf {H}_1\) is satisfied. Let \(i=1..p\) be fixed; using the definition of the proximal operator \(\text{ prox }_{g_i}^{A_{i,k}}\) in (26) and developing the squared norms gives

$$\begin{aligned}&g_i (y_{i}^k) - g_i (y_{i}^{k+1}) \ge \frac{1}{2} \Vert y_i^{k+1} - y_i^k\Vert _{A_{i,k}}^2 + \langle y_i^{k+1} - y_i^k, \nabla _i h(Y_i^k + S_i^k)\rangle \nonumber \\&\quad \quad - \langle y_i^{k+1} - y_i^{k}, r_i^k +s_i^k\rangle _{A_{i,k}}. \end{aligned}$$
(27)

Using \(\mathbf {HE}_3\) in (27), the latter results in

$$\begin{aligned} g_i (y_{i}^k) - g_i (y_{i}^{k+1}) \ge \frac{1}{2} \Vert y_i^{k+1} - y_i^k\Vert _{\rho A_{i,k}}^2 + \langle y_i^{k+1} - y_i^k, \nabla _i h(Y_i^k + S_i^k)\rangle . \end{aligned}$$
(28)

We introduce \(\tilde{h}_{i,k} : y_i \in H_i \mapsto (y_1^{k+1},..,y_{i-1}^{k+1},y_i,y_{i+1}^k,..,y_p^k) \in \mathbb {R}\) for fixed \(k\in \mathbb {N}\) and \(i=1,\dots ,p\). It satisfies \(\tilde{h}_{i,k} (y_i^k) = h(Y_i^k)\), \(\tilde{h}_{i,k} (y_i^{k+1}) = h(Y_{i+1}^k)\) and \(\nabla \tilde{h}_{i,k} (y_i^k) = \nabla _i h (Y_i^{k})\). Applying the descent lemma to \(\tilde{h}_{i,k}\), we obtain

$$\begin{aligned} h(Y_{i+1}^k) - h(Y_i^k) - \langle y_i^{k+1} - y_i^k , \nabla _i h(Y_i^k) \rangle \le \frac{L}{2} \Vert y_i^{k+1} - y_i^k \Vert ^2. \end{aligned}$$
(29)

Then, combining (28) and (29), we get

$$\begin{aligned}&\frac{1}{2} \Vert y_i^{k+1} - y_i^k\Vert _{\rho A_{i,k} - Lid_{H_i}}^2 + \langle y_i^{k+1} - y_i^k, \nabla _i h(Y_i^k + S_i^k) - \nabla _i h(Y_i^k) \rangle \nonumber \\&\quad \le g_i (y_{i}^k) - g_i (y_{i}^{k+1}) + h(Y_i^k) - h(Y_{i+1}^k), \end{aligned}$$
(30)

where \(\rho A_{i,k} - Lid_{H_i}\) remains coercive, since \(\rho \alpha _k > L\). Using successively the Cauchy–Schwartz inequality, the Lipschitz property of \(\nabla _i h\) (see Remark 4.2) and \(\mathbf {HE}_1\), one gets

$$\begin{aligned} \langle y_i^{k+1} - y_i^k, \nabla _i h(Y_i^k + S_i^k) - \nabla _i h(Y_i^k) \rangle \ge - \frac{\sigma L}{2\sqrt{p}} \Vert y_i^{k+1} - y_i^k \Vert ^2. \end{aligned}$$

By inserting this estimation in (30), we deduce that

$$\begin{aligned} g_i (y_{i}^k) - g_i (y_{i}^{k+1}) + h(Y_i^k) - h(Y_{i+1}^k) \ge \frac{1}{2} \Vert y_i^{k+1} - y_i^k\Vert _{\rho A_{i,k} - L(\sigma \frac{\sqrt{p}}{p} +1)id_{H_i}}^2. \end{aligned}$$
(31)

We deduce that \(\mathbf {H}_1\) is fulfilled with \(a_k=\frac{\rho \alpha _k - L(\sigma \frac{\sqrt{p}}{p} +1)}{2}\), by summing (31) over \(i\):

$$\begin{aligned} f(Y^k) - f(Y^{k+1})&= \sum \limits _{i=1}^{p} g_i(y_i^k) -g_i(y_i^{k+1}) + h(Y_i^k) - h(Y_{i+1}^k) \\&\ge \frac{1}{2} \sum \limits _{i=1}^{p} \Vert y_i^{k+1} - y_i^k\Vert _{\rho A_{i,k} - L({\sigma }\frac{\sqrt{p}}{p} +1)id_{H_i}}^2. \end{aligned}$$

To prove \(\mathbf {H}_2\), fix \(i=1,\dots ,p\) and use Fermat’s first-order condition in (26) to get

$$\begin{aligned} 0 \in \partial g_i(y_i^{k+1}) + \left\{ A_{i,k}(y_i^{k+1} - y_i^k) - A_{i,k}(r_i^k + s_i^k) + \nabla _i h(Y_i^k + S_i^k) \right\} . \end{aligned}$$
(32)

Define \(w_i^{k+1} := \nabla _i h(Y^k) - \nabla _i h(Y_i^k + S_i^k) - A_{i,k}(y_i^{k+1} - y_i^k) + A_{i,k}(r_i^k + s_i^k)\), which lies in \(\partial g_i(y_i^{k+1}) + \nabla _i h(Y^{k+1})\), by (32). The triangle inequality gives

$$\begin{aligned} \Vert w_i^{k+1} \Vert \le \ \beta _k \left( \Vert y_i^{k+1} - y_i^k \Vert + \Vert r_i^k \Vert + \Vert s_i^k \Vert \right) + \Vert \nabla _i h(Y_i^k + S_i^k) - \nabla _i h(Y^{k+1}) \Vert .\nonumber \\ \end{aligned}$$
(33)

Using the error estimations from (HE), and the \(\sqrt{p} L\)-Lipschitz continuity of \(\nabla _i h\) in (33), we get

$$\begin{aligned} \Vert w_i^{k+1} \Vert \le \ (\beta _k (1 + \sigma ) + \sqrt{p}L\sigma ) \Vert y_i^{k+1} - y_i^k \Vert + \sqrt{p}L \Vert Y^{k+1} - Y^k \Vert + \beta _k \mu _k.\nonumber \\ \end{aligned}$$
(34)

Define now \(W^{k+1} := (w_1^{k+1},\ldots ,w_p^{k+1}) \in \partial f (Y^{k+1})\) (recall the definition of \(w_i^{k+1}\)). Then, by summing (34) over \(i=1 \ldots p\), we have (using \(\sqrt{p} \le p \le p^2\))

$$\begin{aligned} \Vert W^{k+1} \Vert \ \le \ \sum \limits _{i=1}^{p} \Vert w_i^{k+1} \Vert \ \le \ p\beta _k \mu _k + p^2 (\beta _k+L) (1 + \sigma )\Vert Y^{k+1} - Y^k \Vert . \end{aligned}$$

Hence, \(\mathbf {H}_2\) is verified with \(b_{k+1}=\frac{1}{p^2 (1+\sigma )(\beta _k +L)}\) and \(\epsilon _{k+1}= \frac{\beta _k \mu _k}{p(1+\sigma )(\beta _k +L)}\).

Now we just need to check that the hypotheses \(\mathbf {H}_3\) are satisfied with our hypotheses on \(\alpha _k\), \(\beta _k\) and \(\mu _k\). Clearly, \(\mathbf {H}_3(i)\) holds since we have supposed that \(\alpha _k \ge \underline{\alpha } > (\sigma \frac{\sqrt{p}}{p} +1) \frac{L}{\rho }\). Then, \(\mathbf {H}_3(ii)\) asks that \(b_k \notin \ell ^1\), which is equivalent to \(\frac{1}{\beta _k +L} \notin \ell ^1\) in our context. This holds since we have supposed that \(\frac{1}{\beta _k} \notin \ell ^1\). Hypothesis \(\mathbf {H}_3(iii)\) is satisfied because \(\frac{\beta _k}{\alpha _{k+1}}\) is supposed to be bounded. Finally, \(\mathbf {H}_3(iv)\) asks the summability of \(\frac{\beta _k \mu _k}{\beta _k +L}\) which is bounded by \(\mu _k \in \ell ^1\). \(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Frankel, P., Garrigos, G. & Peypouquet, J. Splitting Methods with Variable Metric for Kurdyka–Łojasiewicz Functions and General Convergence Rates. J Optim Theory Appl 165, 874–900 (2015). https://doi.org/10.1007/s10957-014-0642-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10957-014-0642-3

Keywords

Mathematics Subject Classification

Navigation