Abstract
We investigate an inertial forward–backward algorithm in connection with the minimization of the sum of a non-smooth and possibly non-convex and a non-convex differentiable function. The algorithm is formulated in the spirit of the famous FISTA method; however, the setting is non-convex and we allow different inertial terms. Moreover, the inertial parameters in our algorithm can take negative values too. We also treat the case when the non-smooth function is convex, and we show that in this case a better step size can be allowed. Further, we show that our numerical schemes can successfully be used in DC-programming. We prove some abstract convergence results which applied to our numerical schemes allow us to show that the generated sequences converge to a critical point of the objective function, provided a regularization of the objective function satisfies the Kurdyka–Łojasiewicz property. Further, we obtain a general result that applied to our numerical schemes ensures convergence rates for the generated sequences and for the objective function values formulated in terms of the KL exponent of a regularization of the objective function. Finally, we apply our results to image restoration.
Similar content being viewed by others
References
Alecsa, C.D., László, S.C., Pinţa, T.: An extension of the second order dynamical system that models Nesterov’s convex gradient method. Appl Math Optim 84, 1687–1716 (2021)
Alecsa, C.D., László, S.C., Viorel, A.: A gradient-type algorithm with backward inertial steps associated to a nonconvex minimization problem. Numer Algor 84, 485–512 (2020)
Alvarez, F., Attouch, H.: An inertial proximal method for maximal monotone operators via discretization of a nonlinear oscillator with damping. Set-Valued Anal. 9, 3–11 (2001)
Apidopoulos, V., Aujol, J.F., Dossal, C.: Convergence rate of inertial Forward-Backward algorithm beyond Nesterov’s rule. Math. Program. 180, 137–156 (2020)
Attouch, H., Bolte, J.: On the convergence of the proximal algorithm for nonsmooth functions involving analytic features. Math. Program. 116, 5–16 (2009)
Attouch, H., Bolte, J., Redont, P., Soubeyran, A.: Proximal alternating minimization and projection methods for nonconvex problems: an approach based on the Kurdyka-Łojasiewicz inequality. Math. Operat. Res. 35, 438–457 (2010)
Attouch, H., Bolte, J., Svaiter, B.F.: Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward-backward splitting, and regularized Gauss-Seidel methods. Math. Program. 137, 91–129 (2013)
Attouch, H., Peypouquet, J., Redont, P.: A Dynamical Approach to an Inertial Forward-Backward Algorithm for Convex Minimization. SIAM J. Optimiz. 24, 232–256 (2014)
Bauschke, H.H., Combettes, P.L.: Convex Analysis and Monotone Operator Theory inHilbert Spaces. Springer, New York (2011)
Beck, A.: First-Order Methods in Optimization. SIAM, Philadelphia (2017)
Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imag. Sci. 2, 183–202 (2009)
Bégout, P., Bolte, J., Jendoubi, M.A.: On damped second-order gradient systems. J. Diff. Eq. 259, 3115–3143 (2015)
Bolte, J., Daniilidis, A., Lewis, A.: The Łojasiewicz inequality for nonsmooth subanalytic functions with applications to subgradient dynamical systems. SIAM J. Optimiz. 17, 1205–1223 (2006)
Bolte, J., Daniilidis, A., Lewis, A., Shiota, M.: Clarke subgradients of stratifiable functions. SIAM J. Optimiz. 18, 556–572 (2007)
Bolte, J., Daniilidis, A., Ley, O., Mazet, L.: Characterizations of Łojasiewicz inequalities: subgradient flows, talweg, convexity. Trans. Am. Math. Soci. 362, 3319–3363 (2010)
Bolte, J., Sabach, S., Teboulle, M.: Proximal alternating linearized minimization for nonconvex and nonsmooth problems. Math. Program. 146, 459–494 (2014)
Boţ, R.I., Csetnek, E.R.: A forward-backward dynamical approach to the minimization of the sum of a nonsmooth convex with a smooth nonconvex function. ESAIM: Contr. Optimis. Calc. Variat. 24, 463–477 (2018)
Boţ, R.I., Csetnek, E.R., Hendrich, C.: Inertial Douglas–Rachford splitting for monotone inclusion problems. Appl. Math. Comput. 256, 472–487 (2015)
Boţ, R.I., Csetnek, E.R., László, S.C.: An inertial forward-backward algorithm for the minimization of the sum of two nonconvex functions. EURO J. Comput. Optimiz. 4, 3–25 (2016)
Boţ, R.I., Nguyen, D.K.: The proximal alternating direction method of multipliers in the non-convex setting: convergence analysis and rates. Math. Operat. Res. 45, 682–712 (2020)
Chambolle, A., Dossal, C.: On the convergence of the iterates of the fast iterative shrinkage/thresholding algorithm. J. Optim. Theory Appl. 166, 968–982 (2015)
Chouzenoux, E., Pesquet, J.C., Repetti, A.: Variable metric forward-backward algorithm for minimizing the sum of a differentiable function and a convex function. J. Optim. Theory Appl. 162, 107–132 (2014)
Combettes, P.L.: Solving monotone inclusions via compositions of nonexpansive aver-aged operators. Optimization 53, 475–504 (2004)
Combettes, P.L., Glaudin, L.E.: Quasinonexpansive iterations on the affine hull of orbits: from mann’s mean value algorithm to inertial methods. SIAM J. Optimiz. 27, 2356–2380 (2017)
Cruz Neto, J.X., Oliveira, P.R., Soubeyran, A., Souza, J.C.O.: A generalized proximal linearized algorithm for DC functions with application to the optimal size of the firm problem. Ann. Oper. Res. 289, 313–339 (2020)
Frankel, P., Garrigos, G., Peypouquet, J.: Splitting methods with variable metric for Kurdyka-Łojasiewicz functions and general convergence rates. J. Optim. Theory Appl. 165, 874–900 (2015)
Garrigos, G., Rosasco, L., Villa, S.: Convergence of the forward-backward algorithm: beyond the worst-case with the help of geometry. Math. Program. (2022). https://doi.org/10.1007/s10107-022-01809-4
Ghadimi, E., Feyzmahdavian, H.R., Johansson, M.: Global convergence of the heavy-ball method for convex optimization. 2015 European control conference (ECC), IEEE, pp. 310–315 (2015)
Hu, Y.H., Li, C., Meng, K.W., Qin, J., Yang, X.Q.: Group sparse optimization via \(l_{p, q}\) regularization. J. Mach. Learn. Res. 30, 52 (2017)
Hu, Y., Li, C., Meng, K., Yang, X.: Linear convergence of inexact descent method and inexact proximal gradient algorithms for lower-order regularization problems. J. Glob. Optim. 79, 853–883 (2021)
Johnstone, P.R., Moulin, P.: Local and global convergence of a general inertial proximal splitting scheme for minimizing composite functions. Comput. Optim. Appl. 67, 259–292 (2017)
Johnstone, P.R., Moulin, P.: Convergence rates of inertial splitting schemes for nonconvex composite optimization. 2017 IEEE International conference on acoustics, speech and signal processing (ICASSP), pp. 4716–4720 (2017)
Kurdyka, K.: On gradients of functions definable in o-minimal structures. Annales de l’institut Fourier (Grenoble) 48, 769–783 (1998)
László, S.C.: Convergence rates for an inertial algorithm of gradient type associated to a smooth non-convex minimization. Math. Program. 190, 285–329 (2021)
Lessard, L., Recht, B., Packard, A.: Analysis and design of optimization algorithms via integral quadratic constraints. SIAM J. Optimiz. 26, 57–95 (2016)
Liang, J., Fadili, J., Peyré, G.: Activity identification and local linear convergence of inertial forward- backward splitting. SIAM J. Optimiz. 27, 408–437 (2017)
Liang, J., Fadili, J., Peyré, G.: A Multi-step Inertial Forward–Backward Splitting Method for Non-convex Optimization. In: Lee, D., Sugiyama, M., Luxburg, U., Guyon, I., Garnett, R. (eds.): Advances in Neural Information Processing Systems. vol. 29, (2016)
Li, G., Pong, T.K.: Calculus of the exponent of Kurdyka–Łojasiewicz inequality and its applications to linear convergence of first-order methods. Found. Comput. Math. 18, 1199–1232 (2018)
Łojasiewicz, S.: Une propriété topologique des sous-ensembles analytiques réels, Les Équations aux Dérivées Partielles. Éditions du Centre National de la Recherche Scientifique Paris, 87–89 (1963)
Lorenz, D.A., Pock, T.: An inertial forward-backward algorithm for monotone inclusions. J. Math. Imaging Vis. 51, 311–325 (2015)
Mordukhovich, B.: Variational Analysis and Generalized Differentiation, I: Basic Theory, II: Applications. Springer, Berlin (2006)
Moudafi, A., Oliny, M.: Convergence of a splitting inertial proximal method for monotone operators. J. Comp. Appl. Math. 155, 447–454 (2003)
Nesterov, Y.: A method for solving the convex programming problem with convergence rate \(O(1/k^2)\), (Russian). Dokl. Akad. Nauk SSSR 269(3), 543–547 (1983)
Nesterov, Y.: Introductory Lectures on Convex Optimization: a Basic Course. Kluwer Academic Publishers, Dordrecht (2004)
Ochs, P.: Local convergence of the heavy-ball method and iPiano for non-convex optimization. J. Optim. Theory Appl. 177, 153–180 (2018)
Ochs, P., Chen, Y., Brox, T., Pock, T.: iPiano: inertial proximal algorithm for non-convex optimization. SIAM J. Imag. Sci. 7, 1388–1419 (2014)
Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. U.S.S.R. Comput. Math. Math. Phys. 4, 1–17 (1964)
Rockafellar, R.T., Wets, R.J.-B.: Variational Analysis, Fundamental Principles of Mathematical Sciences, p. 317. Springer, Berlin (1998)
Su, W., Boyd, S., Candès, E.J.: A differential equation for modeling Nesterov’s accelerated gradient method: theory and insights. J. Mach. Learn. Res. 17, 1–43 (2016)
Sun, T., Yin, P., Li, D., Huang, C., Guan, L., Jiang, H.: Non-ergodic convergence analysis of heavy-ball algorithms, The Thirty-Third AAAI conference on artificial intelligence, (2019)
Wu, Z., Li, M.: General inertial proximal gradient method for a class of nonconvex nonsmooth optimization problems. Comput. Optimiz. Appl. 73, 129–158 (2019)
Wu, Z., Li, C., Li, M., Lim, A.: Inertial proximal gradient methods with Bregman regularization for a class of nonconvex optimization problems. J. Glob. Optim. 79, 617–644 (2021)
Zavriev, S.K., Kostyuk, F.V.: Heavy-ball method in non-convex optimization problems. Comput. Math. Model. 4, 336–341 (1993)
Acknowledgements
This work was supported by a grant of the Ministry of Research, Innovation and Digitization, CNCS - UEFISCDI, Project No. PN-III-P1\(-\)1.1-TE-2021-0138, within PNCDI III. The author is thankful to two anonymous referees for their valuable remarks and suggestions which improved the quality of the paper.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Heinz Bauschke.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Proofs of Abstract Convergence Results
In what follows we give full proofs for Lemma 3.1, Corollary 3.1, Theorem 3.1 and Theorem 3.2.
Proof of Lemma 3.1
We divide the proof into the following steps.
Step I. We show that \(u_1\in B(u^*,\rho )\) and \(F(u_1)< F(u^*)+\eta .\)
Indeed, \(u_0\in B(u^*,\rho )\) and (5) assures that \(F(u_1)\ge F(u^*).\) Further, (H1) assures that
Since \(\Vert x_1-x^*\Vert =\Vert (x_1-x_0)+(x_0-x^*)\Vert \le \Vert x_1-x_0\Vert +\Vert x_0-x^*\Vert \) and \(F(u_1)\le F(u_0)\) the condition (6) leads to
Now, from (H3) we have \(\Vert u_1-u^*\Vert \le c_1\Vert x_1-x^*\Vert +c_2\Vert x_{0}-x^*\Vert \) hence
Thus, \(u_1\in B(u^*,\rho );\) moreover, (5) and (H1) provide that \(F(u^*)\le F(u_2)\le F(u_1)\le F(u_0)< F(u^*)+\eta .\)
Step II. Next we show that whenever for a \(k\ge 1\) one has \(u_k\in B(u^*,\rho ),\,F(u_k)<F(u^*)+\eta \) then it holds that
Hence, let \(k\ge 1\) and assume that \(u_k\in B(u^*,\rho ),\,F(u_k)<F(u^*)+\eta \). Note that from (H1) and (5) one has \(F(u^*)\le F(u_{k+1})\le F(u_k)<F(u^*)+\eta ;\) hence,
thus, (28) is well stated. Now, if \(x_k=x_{k+1}\) then (28) trivially holds.
Otherwise, from (H1) and (5) one has
Consequently, \(u_k\in B(u^*,\rho )\cap \{u\in \mathbb {R}^m: F(u^*)<F(u)<F(u^*)+\eta \}\) and \(B(u^*,\rho )\subseteq B(u^*,\sigma )\subseteq U\); hence, by using the KL inequality we get
Since \(\varphi \) is concave, and (29) assures that \(F(u_{k+1})-F(u^*)\in [0,\eta ),\) one has
consequently,
Now, by using (H1) and (H2) we get that
Consequently,
and by arithmetical-geometrical mean inequality we have
which leads to (28).
Step III. Now we show by induction that (28) holds for every \(k\ge 1.\) Indeed, Step II. can be applied for \(k=1\) since according to Step I. \(u_1\in B(u^*,\rho )\) and \(F(u_1)< F(u^*)+\eta .\) Consequently, for \(k=1\) the inequality (28) holds.
Assume that (28) holds for every \(k\in \{1,2,...,n\}\) and we show also that (28) holds for \(k=n+1.\) Arguing as at Step II., the condition (H1) and (5) assure that \(F(u^*)\le F(u_{n+1})\le F(u_n)<F(u^*)+\eta ,\) hence it remains to show that \(u_{n+1}\in B(u^*,\rho ).\) By using the triangle inequality and (H3), one has
By summing up (28) from \(k=1\) to \(k=n\) and using \(x_{-1}=x_0\) we obtain
Combining (30) and (31) and neglecting the negative terms, we get
But \(\varphi \) is strictly increasing and \(F(u_1)-F(u^*)\le F(u_0)-F(u^*)\), hence
According to (H1), one has
hence, from (6) we get
Hence, we have shown so far that \(u_n\in B(u^*,\rho )\) for all \(n\in \mathbb {N}.\)
Step IV. According to Step III, the relation (28) holds for every \(k\ge 1.\) But this implies that (31) holds for every \(n\ge 1.\) By using (32) and neglecting the non-positive terms, (31) becomes
Now letting \(n\longrightarrow +\infty \) in (33), we obtain that \(\sum _{k=1}^{\infty } \Vert x_k-x_{k-1}\Vert <+\infty .\)
Obviously, the sequence \(S_n=\sum _{k=1}^n\Vert x_k-x_{k-1}\Vert \) is Cauchy; hence, for all \(\varepsilon >0\) there exists \(N_{\varepsilon }\in \mathbb {N}\) such that for all \(n\ge N_\varepsilon \) and for all \(p\in \mathbb {N}\) one has \(S_{n+p}-S_n\le \varepsilon .\)
But
hence, the sequence \((x_n)_{n\in \mathbb {N}}\) is Cauchy and consequently is convergent. Let \(\lim _{n\longrightarrow +\infty }x_n=\overline{x}\) and let \(\overline{u}=(\overline{x},\overline{x}).\) Now, from (H3) we have
consequently \((u_n)_{n\in \mathbb {N}}\) converges to \(\overline{u}.\) Further, \((u_n)_{n\in \mathbb {N}}\subseteq B(u^*,\rho )\) and \(\rho <\sigma \), hence \(\overline{u}\in B(u^*,\sigma ).\)
Since \(F(u^*)\le F(u_n)<F(u^*)+\eta \) for all \(n\ge 1\) and the sequence \((F(u_n))_{n\ge 1}\) is decreasing, obviously \(F(u^*)\le \lim _{n\longrightarrow +\infty } F(u_n)<F(u^*)+\eta .\) Assume that \(F(u^*)< \lim _{n\longrightarrow +\infty } F(u_n).\) Then, one has \( u_n\in B(u^*,\sigma )\cap \{z\in \mathbb {R}^m: F(u^*)<F(z)<F(u^*)+\eta \}\) and by using the KL inequality and the fact that \(\varphi \) is concave, therefore \(\varphi '\) is decreasing, we get
for all \(n\ge 1,\) impossible, since according to (H2) and the fact that \((x_n)_{n\in \mathbb {N}}\) converges one has
Consequently, one has \(\lim _{n\longrightarrow +\infty }F(u_n)=F(u^*).\) Since \(u_n\longrightarrow \overline{u},\, n\longrightarrow +\infty \) and F is lower semi-continuous, it is obvious that \(\lim _{n\longrightarrow +\infty } F(u_n)\ge F(\overline{u}).\) Hence,
Assume now that (H4) also holds. Obviously, in this case
Consequently, one has \(F(\overline{u})=F(u^*).\)
From (H2) we have that there exists \(W_{n_j}\in {\partial }F(u_{n_j})\) such that
consequently, \(\lim _{j\longrightarrow +\infty }\Vert W_{n_j}\Vert =0.\)
Now, one has \((u_{n_j},W_{n_j})\longrightarrow (\overline{u}, 0) \text{ and } F(u_{n_j})\longrightarrow F(\overline{u}),\,j\longrightarrow +\infty \); hence, by the closedness criterion of the graph of the limiting subdifferential we get \(0\in {\partial }F(\overline{u}),\) which shows that \(\overline{u}\in {{\,\textrm{crit}\,}}(F)\). \(\square \)
Next we prove Corollary 3.1.
Proof of Corollary 3.1
The claim that (H3) holds with \(c_1=2+c\) and \(c_2=c\) is an easy verification. We have to show that (5) holds, that is, \(u_n\in B(u^*,\rho )\) implies \(u_{n+1}\in B(u^*,\sigma )\) for all \(n\in \mathbb {N}.\)
According to (H1), the assumption that \(F(u_n)\ge F(u^*)\) for all \(n\ge 1\) and the hypotheses of Lemma 3.1, we have
and
for all \(n\ge 1\).
Assume now that \(n\ge 1\) and \(u_n\in B(u^*,\rho ).\) Then, by using the triangle inequality we get
Further,
where \(c=\sup _{n\in \mathbb {N}}(|\alpha _{n}|+|\beta _{n}|).\)
Consequently, we have
which is exactly \(u_{n+1}\in B(u^*,\sigma ).\) Further, arguing analogously as at Step I in the proof of Lemma 3.1, we obtain that \(u_{1}\in B(u^*,\rho )\subseteq B(u^*,\sigma )\) and this concludes the proof. \(\square \)
Now we are ready to prove Theorem 3.1.
Theorem 3.1
We will apply Corollary 3.1. Since \(u^*=(x^*,x^*)\in \omega ((u_n)_{n\in \mathbb {N}})\) there exists a subsequence \((u_{n_k})_{k\in \mathbb {N}}\) such that \(u_{n_k}\longrightarrow u^*,\, k\longrightarrow +\infty .\)
From (H1) we get that the sequence \((F(u_n))_{n\in \mathbb {N}}\) is decreasing and from (H4), which according to the hypotheses holds for \(u^*\), one has \(F(u_{n_k})\longrightarrow F(u^*),\, k\longrightarrow +\infty ,\) that implies
We show next that \(x_{n_k}\longrightarrow x^*,\, k\longrightarrow +\infty .\) Indeed, from (H1) one has \(a\Vert x_{n_k}-x_{{n_k}-1}\Vert ^2\le F(u_{n_k-1})-F(u_{{n_k}})\) and obviously the right side of the above inequality goes to 0 as \(k\longrightarrow +\infty .\) Hence, \(\lim _{k\longrightarrow +\infty }(x_{n_k}-x_{n_k-1})=0.\) Further, since the sequences \((\alpha _n)_{n\in \mathbb {N}},(\beta _n)_{n\in \mathbb {N}}\) are bounded we get \(\lim _{k\longrightarrow +\infty }\alpha _{n_k}(x_{n_k}-x_{n_k-1})=0\) and \(\lim _{k\longrightarrow +\infty }\beta _{n_k}(x_{n_k}-x_{n_k-1})=0.\) Finally, \(u_{n_k}\longrightarrow u^*,\, k\longrightarrow +\infty \) is equivalent to \(x_{n_k}-x^*+\alpha _{n_k}(x_{n_k}-x_{{n_k}-1})\longrightarrow 0,\, k\longrightarrow +\infty \) and \(x_{n_k}-x^*+\beta _{n_k}(x_{n_k}-x_{{n_k}-1})\longrightarrow 0,\, k\longrightarrow +\infty ,\) which lead to the desired conclusion, that is
The KL property around \(u^*\) states the existence of quantities \(\varphi \), U, and \(\eta \) as in Definition 2.1. Let \(\sigma > 0 \) be such that \( B(u^*, \sigma )\subseteq U\) and \(\rho \in (0,\sigma ).\) If necessary we shrink \(\eta \) such that \(\eta < \frac{a(\sigma -\rho )^2}{4(1+c)^2},\) where \(c=\sup _{n\in \mathbb {N}}(|\alpha _n|+|\beta _n|).\)
Now, since the function \(\varphi \) is continuous and \((F(u_n))\) is non-increasing, further \(F(u_n)\longrightarrow F(u^*),\,n\longrightarrow +\infty \), \(\varphi (0)=0\) and \(u_{n_k}\longrightarrow u^*,\,x_{n_k}\longrightarrow x^*,\, k\longrightarrow +\infty \) we conclude that there exists \(n_0\in \mathbb {N},\,n_0\ge 1\) such that \(u_{n_0}\in B(u^*,\rho )\) and \(F(u^*)\le F(u_{n_0})<F(u^*)+\eta ,\) moreover
Hence, Corollary 3.1 and consequently Lemma 3.1 can be applied to the sequence \(({\mathcal {U}}_n)_{n\in \mathbb {N}},\) \({\mathcal {U}}_n=u_{n_0+n}.\)
Thus, according to Lemma 3.1, \(({\mathcal {U}}_n)_{n\in \mathbb {N}}\) converges to a point \((\overline{x},\overline{x})\in {{\,\textrm{crit}\,}}(F),\) consequently \((u_n)_{n\in \mathbb {N}}\) converges to \((\overline{x},\overline{x}).\) Then, since \(\omega ((u_n)_{n\in \mathbb {N}})=\{(\overline{x},\overline{x})\}\) one has \(x^*=\overline{x}.\) Hence, \((x_n)_{n\in \mathbb {N}}\) converges to \(x^*\), \((u_n)_{n\in \mathbb {N}}\) converges to \(u^*\) and \(u^*\in {{\,\textrm{crit}\,}}(F).\) \(\square \)
Abstract convergence rates in terms of the KL exponent
The following lemma was established in [20] and will be crucial in obtaining our convergence rates (see also [5] for different techniques).
Lemma B.1
( [20] Lemma 15) Let \((e_n)_{n\in \mathbb {N}}\) be a monotonically decreasing positive sequence converging to 0. Assume further that there exist the natural numbers \(l_0\ge 1\) and \(n_0\ge l_0\) such that for every \(n\ge n_0\) one has
where \(C_0>0\) is some constant and \(\theta \in [0,1).\) Then following statements are true:
-
(i)
if \(\theta =0,\) then \((e_n)_{n\ge n_0}\) converges in finite time;
-
(ii)
if \(\theta \in \left( 0,\frac{1}{2}\right] \), then there exists \(C_1>0\) and \(Q\in [0,1)\), such that for every \(n\ge n_0\)
$$\begin{aligned} e_n\le C_1 Q^n; \end{aligned}$$ -
(iii)
if \(\theta \in \left[ \frac{1}{2},1\right) \), then there exists \(C_2>0\), such that for every \(n\ge n_0+l_0\)
$$\begin{aligned} e_n\le C_2(n-l_0+1)^{-\frac{1}{2\theta -1}}. \end{aligned}$$
Now we are ready to prove Theorem 3.2.
Proof of Theorem 3.2
The fact that the sequence \((x_n)_{n\in \mathbb {N}}\) converges to \(x^*\), \((u_n)_{n\in \mathbb {N}}\) converges to \(u^*\) and \(u^*\in {{\,\textrm{crit}\,}}(F)\) follows from Theorem 3.1. We divide the proof of the statements (a)-(c) into two cases.
Case I. Assume that there exists \(\overline{n}\in \mathbb {N}\) such that \(F(u_{\overline{n}})=F(u^*).\) According to (H4) there exists \((u_{n_j})\subseteq (u_n)\) such that \(u_{n_j}\longrightarrow u^*,\,F(u_{n_j})\longrightarrow F(u^*),\,j\longrightarrow +\infty .\) Now, \(F(u_{n_j})=F(u^*)\) for all \(n_j\ge \overline{n}\) since the sequence \((F(u_{n_j}))_{j\in \mathbb {N}}\) is decreasing, and hence \(F(u^*)=F(u_{\overline{n}})\ge F(u_{n_j})\ge \lim _{j\longrightarrow +\infty }F(u_{n_j})=F(u^*).\) Further, for every \(n\ge \overline{n}\) there exists \(j_0\in \mathbb {N}\) such that \(n\le n_{j_0}\), consequently \(F(u^*)=F(u_{\overline{n}})\ge F(n)\ge F(u_{n_{j_0}})=F(u^*).\) In other words, \(F(u_n)=F(u^*)\) for all \(n\ge \overline{n}.\) From (H1) we get that for all \(n\ge \overline{n}\)
hence, \(x_{n+1}=x_n\) for all \(n\ge \overline{n}.\) But \(x_n\longrightarrow x^*,\,n\longrightarrow +\infty \), hence \(x_n=x^*\) for all \(n\ge \overline{n}\).
Then, \(u_n=(x_n,x_n)=(x^*,x^*)\) for all \(n\ge \overline{n}\). Consequently, \((F(u_n))_{n\in \mathbb {N}},(x_n)_{n\in \mathbb {N}}\) and \((u_n)_{n\in \mathbb {N}}\) converge in a finite number of steps and this concludes \((a)-(c)\).
Case II. We assume that \(F(u_n)>F(u^*)\) for all \(n\in \mathbb {N}.\) Now, by using (H2) and (H1) we get
for all \(n\ge 2.\)
Now, according to (H4) there exists \((u_{n_j})\subseteq (u_n)\) such that
Combining the above fact with the facts that \((F(u_n))\) is non-increasing and \(u_n\longrightarrow u^*,\,n\longrightarrow +\infty \), we conclude that there exists \(\overline{n}\in \mathbb {N},\,\overline{n}\ge 2\) such that \(F(u^*)<F(u_n)<F(u^*)+\eta \) and \(u_n\in B(u^*,\varepsilon )\) for all \(n\ge \overline{n}.\) So, since the function F has the Kurdyka–Łojasiewicz property with an exponent \(\theta \in [0,1)\) at \(u^*\), we can apply the KL-inequality and we get
Further, using (H4) again, we have \(F(u_{n_j})\longrightarrow F(u^*),\,j\longrightarrow +\infty \) and \((F(u_n))\) is non-increasing which leads to
Let us denote \(e_n=F(u_n)-F(u^*).\) Then \((e_n)_{n\in \mathbb {N}}\) is a monotonically decreasing positive sequence converging to 0. Further from (39) we have that there exist the natural numbers \(l_0=2\) and \(\overline{n}\ge l_0\) such that for every \(n\ge \overline{n}\) one has
where \(C_0=\frac{a}{2b^2K^2}>0.\) Consequently, Lemma B.1 can be applied. Let \(\theta =0\). Then, the sequence \((F(u_n)-F(u^*))\) converges in a finite number of steps, that is \(F(u_n)=F(u^*)\) after and index \(n_1\in \mathbb {N}\). Then, according to Case I. \((x_n)\) and \((u_n)\) converges in a finite number of steps and this concludes (a).
Further, according to (28) we have
for all \(k\ge \overline{n}.\) Summing up the latter relation from \(k=n\ge \overline{n}\) to \(k=P>n\), we get \(\sum _{k=1}^P\Vert x_{k+1}-x_{k}\Vert \le 2\Vert x_{n}-x_{n-1}\Vert +\Vert x_{n-1}-x_{n-2}\Vert -2\Vert x_{P+1}-x_{P}\Vert -\Vert x_{P}-x_{P-1}\Vert + \frac{9bK}{4a(1-\theta )}(e_n^{1-\theta }-e_{P+1}^{1-\theta }).\) Now, from the triangle inequality we have \(\Vert x_n-x_{P+1}\Vert \le \sum _{k=1}^P\Vert x_{k+1}-x_{k}\Vert ,\) hence, \(\Vert x_n-x_{P+1}\Vert \le 2\Vert x_{n}-x_{n-1}\Vert +\Vert x_{n-1}-x_{n-2}\Vert -2\Vert x_{P+1}-x_{P}\Vert -\Vert x_{P}-x_{P-1}\Vert + \frac{9bK}{4a(1-\theta )}(e_n^{1-\theta }-e_{P+1}^{1-\theta }).\) By neglecting the non-positive terms and letting \(P\longrightarrow +\infty \), we get
Now, by using (H1) we get
Finally, combining (40) and (41) we obtain
Let \(\theta \in \left( 0,\frac{1}{2}\right] .\) Then, there exists \(C_1>0\) and \(Q\in [0,1)\), such that for every \(n\ge \overline{n}\) \(e_n\le C_1 Q^n.\) Consequently, (42) yields that for all \(n\ge \overline{n}+2\) one has
Now, since \(\theta \le \frac{1}{2}\) and \(Q\in [0,1)\) it is obvious that \(Q^{(1-\theta )n}\le Q^{\frac{n}{2}},\) and hence, (43) yields
for some \(\overline{C}\) and for all \(n\ge \overline{n}+2.\)
Further, according to (H3)
for all \(n\ge \overline{n}+3\), where \(c=\sup _{n\in \mathbb {N}}(|\alpha _n|+|\beta _n|).\) Hence, (b) is complete if one takes \(A_1=\max (C_1,\overline{C},\overline{C}_1)\) and \(\overline{k}=\overline{n}+3\).
Let \(\theta \in \left[ \frac{1}{2},1\right) \). Then, there exists \(C_2>0\), such that for every \(n\ge \overline{n}+2\) \(e_n\le C_2(n-1)^{-\frac{1}{2\theta -1}}.\) But \((n-1)^{-\frac{1}{2\theta -1}}\le 2^{\frac{1}{2\theta -1}}n^{-\frac{1}{2\theta -1}},\) hence \(e_n\le C_2 2^{\frac{1}{2\theta -1}}n^{-\frac{1}{2\theta -1}}=\overline{C}_2 n^{-\frac{1}{2\theta -1}},\) for all \(n\ge \overline{n}+2.\)
Now, by using (42) we deduce that there exists \(\overline{C}_3,\overline{C}_4,\overline{C}_5>0\) such that
or all \(n\ge \overline{n}+4.\) Now, since \(\theta >\frac{1}{2}\) one has \(n^{-\frac{1-\theta }{2\theta -1}}\ge n^{-\frac{\frac{1}{2}}{2\theta -1}}\) we conclude that there exists \(\overline{A}_2>0\) such that
or all \(n\ge \overline{n}+4.\)
By using the form of \(u_n\), we argue as at the case \(\theta \in \left( 0,\frac{1}{2}\right] \) in order to obtain that there exists \(\overline{A}_3\) such that
or all \(n\ge \overline{n}+5.\)
Consequently, (c) holds with \(A_2=\max (\overline{C}_2,\overline{A}_2,\overline{A}_3)\) and \(\overline{k}=\overline{n}+5.\) \(\square \)
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
László, S.C. A Forward–Backward Algorithm With Different Inertial Terms for Structured Non-Convex Minimization Problems. J Optim Theory Appl 198, 387–427 (2023). https://doi.org/10.1007/s10957-023-02204-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10957-023-02204-5
Keywords
- Global optimization
- Inertial proximal-gradient algorithm
- Non-convex optimization
- Abstract convergence theorem
- Kurdyka–Łojasiewicz inequality
- KL exponent
- Convergence rate