Skip to main content
Log in

Convergence Results of a New Monotone Inertial Forward–Backward Splitting Algorithm Under the Local Hölder Error Bound Condition

  • Published:
Applied Mathematics & Optimization Aims and scope Submit manuscript

Abstract

In this paper, we introduce a new monotone inertial Forward–Backward splitting algorithm (newMIFBS) for the convex minimization of the sum of a non-smooth function and a smooth differentiable function. The newMIFBS can overcome two negative effects caused by IFBS, i.e., the undesirable oscillations ultimately and extremely nonmonotone, which might lead to the algorithm diverges, for some special problems. We study the improved convergence rates for the objective function and the convergence of iterates under a local Hölder error bound (Local HEB) condition. Also, our study extends the previous results for IFBS under the Local HEB. Finally, we present some numerical experiments for the simplest newMIFBS (hybrid_MIFBS) to illustrate our results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Agro, G.: Maximum likelihood and \(l_p\) norm estimators. Stat. Appl. 4, 7 (1992)

    Google Scholar 

  2. Apidopoulos, V., Aujol, J., Dossal, C.: Convergence rate of inertial forward–backward algorithm beyond Nesterov’s rule. Math. Program. 180, 137–156 (2020)

    Article  MathSciNet  Google Scholar 

  3. Apidopoulos, V., Aujol, J., Dossal, C., et al.: Convergence rates of an inertial gradient descent algorithm under growth and flatness conditions. Math. Program. (2020). https://doi.org/10.1007/s10107-020-01476-3

    Article  MATH  Google Scholar 

  4. Attouch, H., Cabot, A.: Convergence rates of inertial forward–backward algorithms. SIAM J. Optim. 28, 849–874 (2018)

    Article  MathSciNet  Google Scholar 

  5. Attouch, H., Peypouquet, J.: The rate of convergence of Nesterov’s accelerated forward–backward method is actually faster than \( {\frac{1}{{{k^2}}}} \). SIAM J. Optim. 26, 1824–1834 (2016)

    Article  MathSciNet  Google Scholar 

  6. Attouch, H., Bolte, J., Redont, P., Soubeyran, A.: Proximal alternating minimization and projection methods for nonconvex problems: an approach based on the Kurdyka–Łojasiewicz inequality. Math. Oper. Res. 35(2), 438–457 (2010)

    Article  MathSciNet  Google Scholar 

  7. Bauschke, H., Combettes, P.: Convex Analysis and Monotone Operator Theory in Hilbert Spaces. CMS Books in Mathematics, Springer, New York (2011)

    Book  Google Scholar 

  8. Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2, 183–202 (2009)

    Article  MathSciNet  Google Scholar 

  9. Beck, A., Teboulle, M.: Fast gradient-based algorithms for constrained total variation image denoising and deblurring problems. IEEE Trans. Image Process. 18, 2419–2434 (2009)

    Article  MathSciNet  Google Scholar 

  10. Bolte, J., Daniilidis, A., Ley, O., Mazet, L.: Characterizations of Łojasiewicz inequalities: subgradient flows, talweg, convexity. Trans. Am. Math. Soc. 362, 3319–3363 (2010)

    Article  Google Scholar 

  11. Bolte, J., Nguyen, T.P., Peypouquet, J., Suter, B.W.: From error bounds to the complexity of first-order descent methods for convex functions. Math. Program. 165, 1–37 (2015)

    MathSciNet  MATH  Google Scholar 

  12. Bonettini, S., Rebegoldi, S., Ruggiero, V.: Inertial variable metric techniques for the inexact forward–backward algorithm. SIAM J. Sci Comput. 40, A3180–A3210 (2018)

    Article  MathSciNet  Google Scholar 

  13. Bonettini, S., Prato, M., Rebegoldi, S.: Convergence of inexact forward–backward algorithms using the forward–backward envelope. SIAM J. Optim. 30, 3069–3097 (2020)

    Article  MathSciNet  Google Scholar 

  14. Burke, J.V., Deng, S.: Weak sharp minima revisited Part III: error bounds for differentiable convex inclusions. Math. Program. 116, 37–56 (2009)

    Article  MathSciNet  Google Scholar 

  15. Chambolle, A., Dossal, C.: On the convergence of the iterates of the fast iterative shrinkage-thresholding algorithm. J. Optim. Theory Appl. 166, 968–982 (2015)

    Article  MathSciNet  Google Scholar 

  16. Combettes, P.L., Wajs, V.R.: Signal recovery by proximal forward–backward splitting. Multisc. Model. Simul. 4, 1168–1200 (2005)

    Article  MathSciNet  Google Scholar 

  17. Drusvyatskiy, D., Lewis, A.S.: Error bounds, quadratic growth, and linear convergence of proximal methods. Math. Oper. Res. 43, 919–948 (2018)

    Article  MathSciNet  Google Scholar 

  18. Hastie, T., Tibshirani, R., Friedman, J., Hastie, T., Friedman, J., Tibshirani, R.: The Elements of Statistical Learning. Springer, Berlin (2009)

    Book  Google Scholar 

  19. Johnstone, P.R., Moulin, P.: Faster subgradient methods for functions with Holderian growth. Math. Program. 180, 417–450 (2020)

    Article  MathSciNet  Google Scholar 

  20. László, S.C.: Convergence rates for an inertial algorithm of gradient type associated to a smooth non-convex minimization. Math. Program. (2020). https://doi.org/10.1007/s10107-020-01534-w

    Article  Google Scholar 

  21. Li, H., Lin, Z.: Accelerated proximal gradient methods for nonconvex programming. In: Proceedings of NeurIPS, pp. 379–387 (2015)

  22. Liu, H.W., Wang, T., Liu, Z.X.: Convergence rate of inertial forward–backward algorithms based on the local error bound condition. http://arxiv.org/pdf/2007.07432

  23. Liu, H.W., Wang, T., Liu, Z.X.: Some modified fast iteration shrinkage thresholding algorithms with a new adaptive non-monotone stepsize strategy for nonsmooth and convex minimization problems. Optimization. http://www.optimization-online.org/DB_HTML/2020/12/8169.html

  24. Liu, H.W., Wang, T.: A nonmonontone accelerated proximal gradient method with variable stepsize strategy for nonsmooth and nonconvex minimization problems. Optimization. http://www.optimization-online.org/DB_HTML/2021/04/8365.html

  25. Luo, Z.Q., Tseng, P.: On the convergence of coordinate descent method for convex differentiable minization. J. Optim. Theory Appl. 72, 7–35 (1992)

    Article  MathSciNet  Google Scholar 

  26. Luo, Z.Q., Tseng, P.: On the linear convergence of descent methods for convex essenially smooth minization. SIAM J. Control Optim. 30, 408–425 (1992)

    Article  MathSciNet  Google Scholar 

  27. Luo, Z.Q., Tseng, P.: Error bounds and convergence analysis of feasible descent methods: a general approach. Ann. Oper. Res. 46, 157–178 (1993)

    Article  MathSciNet  Google Scholar 

  28. Necoara, I., Clipici, D.: Parallel random coordinate descent method for composite minimization: convergence analysis and error bounds. SIAM J. Optim. 26, 197–226 (2016)

    Article  MathSciNet  Google Scholar 

  29. Necoara, I., Nesterov, Y., Glineur, F.: Linear convergence of first order methods for non-strongly convex optimization. Math. Program. 175, 69–107 (2019)

    Article  MathSciNet  Google Scholar 

  30. Nesterov, Y.: A method for solving the convex programming problem with convergence rate \(O\left( {\frac{1}{{{k^2}}}} \right)\). Dokl. Akad. Nauk SSSR. 269, 543–547 (1983)

    MathSciNet  Google Scholar 

  31. Ochs, P., Chen, Y., Brox, T., Pock, T.: Inertial proximal algorithm for nonconvex optimization. SIAM J. Imaging Sci. 7(2), 1388–1419 (2014)

    Article  MathSciNet  Google Scholar 

  32. O’Donoghue, B., Candès, E.: Adaptive restart for accelerated gradient schemes. Found. Comput. Math. 15, 715–732 (2015)

    Article  MathSciNet  Google Scholar 

  33. Opial, Z.: Weak convergence of the sequence of successive approximations for nonexpansive mappings. Bull. Am. Math. Soc. 73, 591–597 (1967)

    Article  MathSciNet  Google Scholar 

  34. Rebegoldi, S., Calatroni, L.: Inexact and adaptive generalized FISTA for strongly convex optimization. https://arxiv.org/pdf/2101.03915.pdf (2021)

  35. Roulet, V., d’Aspremont, A.: Sharpness, restart, and acceleration. SIAM J. Optim. 30, 262–289 (2020)

    Article  MathSciNet  Google Scholar 

  36. Schmidt, M., Roux, N.L., Bach, F.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Advances in Neural Information Processing System 24, NIPS (2011)

  37. Su, W., Boyd, S., Candès, E.J.: A differential equation for modeling Nesterov’s accelerated gradient method: theory and insights. J. Mach. Learn. Res. 17, 1–43 (2016)

    MathSciNet  MATH  Google Scholar 

  38. Villa, S., Salzo, S., Baldassarres, L., Verri, A.: Accelerated and inexact forward–backward algorithms. SIAM J. Optim. 23, 1607–1633 (2013)

    Article  MathSciNet  Google Scholar 

  39. Wang, T., Liu, H.W.: On the convergence results of a class of nonmonotone accelerated proximal gradient methods for nonsmooth and nonconvex minimization problems. Optimization. http://www.optimization-online.org/DB_HTML/2021/05/8423.html

  40. Wang, P.W., Lin, C.J.: Iteration complexity of feasible descent methods for convex optimization. J. Mach. Learn. Res. 15, 1523–1548 (2014)

    MathSciNet  MATH  Google Scholar 

  41. Wen, B., Chen, X.J., Pong, T.K.: Linear convergence of proximal gradient algorithm with extrapolation for a class of nonconvex nonsmooth minimization problems. SIAM J. Optim. 27, 124–145 (2017)

    Article  MathSciNet  Google Scholar 

  42. Yang, T.B.: Adaptive accelerated gradient converging methods under Hölderian error bound condition. http://arxiv.org/pdf/1611.07609v2

  43. Yangy, T., Lin, Q.: A stochastic gradient method with linear convergence rate for a class of non-smooth non-strongly convex optimization. Tech. rep. (2015)

  44. Zhang, H.: The restricted strong convexity revisited: analysis of equivalence to error bound and quadratic growth. Optim. Lett. 11, 817–833 (2017)

    Article  MathSciNet  Google Scholar 

  45. Zhang, H., Cheng, L.: Restricted strong convexity and its applications to convergence analysis of gradient type methods in convex optimization. Optim. Lett. 9, 961–979 (2015)

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hongwei Liu.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: Proof of Lemma 3.2

Proof

From (3.1) with \(y: = {y_{k + 1}},x: = {x_k}\), we obtain that

$$\begin{aligned}&F\left( {{z_{k + 1}}} \right) - {F\left( {{x^*}} \right) } + \frac{1}{{2\lambda }}{\left\| {{z_{k + 1}} - {x_k}} \right\| ^2} + \frac{{1 - \mu }}{{2\lambda }}{\left\| {{z_{k + 1}} - {y_{k + 1}}} \right\| ^2} \nonumber \\&\quad \le F\left( {{x_k}} \right) - {F\left( {{x^*}} \right) } + \frac{1}{{2\lambda }}{ {\alpha _k^2} }{\left\| {{z_k} - {x_{k - 1}}} \right\| ^2} \end{aligned}$$
(A.1)
$$\begin{aligned}&\le F\left( {{z_k}} \right) - {F\left( {{x^*}} \right) } + \frac{1}{{2\lambda }}{\left\| {{z_k} - {x_{k - 1}}} \right\| ^2}. \end{aligned}$$
(A.2)

which means that \(\sum \nolimits _{k = 1}^{ + \infty } {{{\left\| {{z_{k + 1}} - {y_{k + 1}}} \right\| }^2}} < + \infty ,\) i.e., \(\mathop {\lim }\limits _{k \rightarrow \infty } {\left\| {{z_{k + 1}} - {y_{k + 1}}} \right\| ^2} = 0\) and \(F\left( {{z_{k+1}}} \right) - {F\left( {{x^*}} \right) } \le \xi \) for \(\xi = F\left( {{x_1}} \right) - {F\left( {{x^*}} \right) } + \frac{1}{{2\lambda }}{\left\| {{z_1} - {x_0}} \right\| ^2},\) i.e., \({x_{k + 1}} \in {S_\xi }.\) Based on the nonexpansiveness property of the proximal operator [16], \(\nabla f\) is Lipschitz continuous and \(\lambda = \frac{\mu }{{{L_f}}}\) with \(\mu \in \left( {0,1} \right) ,\) we can deduce that

$$\begin{aligned}&\left\| {\mathrm{{pro}}{\mathrm{{x}}_{\lambda g}}\left( {{z_{k + 1}} - \lambda \nabla f\left( {{z_{k + 1}}} \right) } \right) - {z_{k + 1}}} \right\| \\&= \left\| {\mathrm{{pro}}{\mathrm{{x}}_{\lambda g}}\left( {{z_{k + 1}} - \lambda \nabla f\left( {{z_{k + 1}}} \right) } \right) - \mathrm{{pro}}{\mathrm{{x}}_{\lambda g}}\left( {{y_{k + 1}} - \lambda \nabla f\left( {{y_{k + 1}}} \right) } \right) } \right\| \\&\le \left( {1 + \lambda {L_f}} \right) \left\| {{z_{k + 1}} - {y_{k + 1}}} \right\| \le 2\left\| {{z_{k + 1}} - {y_{k + 1}}} \right\| . \end{aligned}$$

Using Lemma 2.3, we can conclude that

$$\begin{aligned} dist\left( {{z_{k + 1}},X^*} \right) \le {2^{\frac{\theta }{{1 - \theta }}}} \cdot \bar{\tau }{\left\| {{z_{k + 1}} - {y_{k + 1}}} \right\| ^{\frac{\theta }{{1 - \theta }}}}. \end{aligned}$$
(A.3)

Applying (3.1) with \(y: = {y_{k + 1}},x: = {x^*}\) and using (A.3), we have

$$\begin{aligned}\begin{array}{*{20}{l}} {F\left( {{z_{k + 1}}} \right) - {F\left( {{x^*}} \right) } \le \frac{1}{{2\lambda }}{{\left\| {{y_{k + 1}} - {x^*}} \right\| }^2} - \frac{1}{{2\lambda }}{{\left\| {{z_{k + 1}} - {x^*}} \right\| }^2}}\\ { = \frac{1}{{2\lambda }}{{\left\| {{z_{k + 1}} - {y_{k + 1}}} \right\| }^2} + \frac{1}{\lambda }\left\langle {{y_{k + 1}} - {z_{k + 1}},{z_{k + 1}} - {x^*}} \right\rangle }\\ { \le \frac{1}{{2\lambda }}{{\left\| {{z_{k + 1}} - {y_{k + 1}}} \right\| }^2} + \frac{1}{\lambda }\left\| {{z_{k + 1}} - {y_{k + 1}}} \right\| \cdot dist\left( {{z_{k + 1}},{X^*}} \right) }\\ { \le \frac{1}{{2\lambda }}{{\left\| {{z_{k + 1}} - {y_{k + 1}}} \right\| }^2} + {2^{\frac{\theta }{{1 - \theta }}}}\frac{{\bar{\tau }}}{\lambda }{{\left\| {{z_{k + 1}} - {y_{k + 1}}} \right\| }^{\frac{1}{{1 - \theta }}}} \le \tilde{\tau }{{\left\| {{z_{k + 1}} - {y_{k + 1}}} \right\| }^{\frac{1}{{1 - \theta }}}},} \end{array}\end{aligned}$$

where \(\tilde{\tau }= \frac{{1 + {2^{\frac{1}{{1 - \theta }}}}\bar{\tau }}}{{2\lambda }}.\) \(\square \)

Appendix B: Proof of Theorem 4.1

Proof

From (3.1) with \(y: = {x_k},x: = {x^*}\), we obtain that

$$\begin{aligned} F\left( {{x_{k + 1}}} \right) \le F\left( {{x^ * }} \right) + \frac{1}{{2\lambda }}{\left\| {{x_k} - {x^ * }} \right\| ^2} - \frac{1}{{2\lambda }}{\left\| {{x_{k + 1}} - {x^ * }} \right\| ^2} - \frac{{1 - \mu }}{{2\lambda }}{\left\| {{x_{k + 1}} - {x_k}} \right\| ^2} \end{aligned}$$
(B.1)

and with \(y: = {x_k},x: = {x_k},\) we obtain that

$$\begin{aligned} F\left( {{x_{k + 1}}} \right) \le F\left( {{x_k}} \right) - \frac{1}{{2\lambda }}{\left\| {{x_{k + 1}} - {x_k}} \right\| ^2} - \frac{{1 - \mu }}{{2\lambda }}{\left\| {{x_{k + 1}} - {x_k}} \right\| ^2} \end{aligned}$$
(B.2)

Similar with the proof of Lemma 3.2, we have

$$\begin{aligned} F\left( {{x_{k + 1}}} \right) - F\left( {{x^ * }} \right) \le \tau {\left\| {{x_{k + 1}} - {x_k}} \right\| ^{\frac{1}{{1 - \theta }}}} \end{aligned}$$
(B.3)

Consider the Local HEB condition with \(\theta \in \left[ {0,\frac{1}{2}} \right) .\) Multiplying (B.2) by \({k^\alpha },\) where \(\alpha \in \left[ {0,1} \right] \) is a constant, we obtain

$$\begin{aligned}&{\left( {k + 1} \right) ^\alpha }\left( {F\left( {{x_{k + 1}}} \right) - F\left( {{x^ * }} \right) } \right) + \frac{1}{{2\lambda }}{k^\alpha }{\left\| {{x_{k + 1}} - {x_k}} \right\| ^2} + \frac{{1 - \mu }}{{2\lambda }}{k^\alpha }{\left\| {{x_{k + 1}} - {x_k}} \right\| ^2} \nonumber \\&\quad \le {k^\alpha }\left( {F\left( {{x_k}} \right) - F\left( {{x^ * }} \right) } \right) + \left( {{{\left( {k + 1} \right) }^\alpha } - {k^\alpha }} \right) \left( {F\left( {{x_{k + 1}}} \right) - F\left( {{x^ * }} \right) } \right) \end{aligned}$$
(B.4)

where

$$\begin{aligned}\begin{array}{l} \left( {{{\left( {k + 1} \right) }^\alpha } - {k^\alpha }} \right) \left( {F\left( {{x_{k + 1}}} \right) - F\left( {{x^*}} \right) } \right) \le \tau \left( {{{\left( {1 + \frac{1}{k}} \right) }^\alpha } - 1} \right) {k^\alpha }{\left\| {{x_{k + 1}} - {x_k}} \right\| ^{\frac{1}{{1 - \theta }}}}\\ \le \frac{{1 - 2\theta }}{{2\left( {1 - \theta } \right) }}{\left( {\frac{\tau }{\varepsilon }} \right) ^{\frac{{2\left( {1 - \theta } \right) }}{{1 - 2\theta }}}} \cdot {\varPsi _k} + \frac{{{\varepsilon ^{2\left( {1 - \theta } \right) }}}}{{2\left( {1 - \theta } \right) }}{k^\alpha }\left\| {{x_{k + 1}} - {x_k}} \right\| ^2 \end{array}\end{aligned}$$

and \({\varPsi _k} = {\left( {{{\left( {1 + \frac{1}{k}} \right) }^\alpha } - 1} \right) ^{\frac{{2\left( {1 - \theta } \right) }}{{1 - 2\theta }}}} \cdot {k^\alpha } = O\left( {{k^{\alpha - 1}}} \right) .\) Hence, let \(\frac{{{\varepsilon ^{2\left( {1 - \theta } \right) }}}}{{2\left( {1 - \theta } \right) }} = \frac{1}{{2\lambda }}\) and \(\alpha = \frac{1}{{1 - 2\theta }},\) then, \({\varPsi _k} = O\left( {\frac{1}{k}} \right) \) and

$$\begin{aligned}&{k^\alpha }\frac{{1 - \mu }}{{2\lambda }}{\left\| {{x_{k + 1}} - {x_k}} \right\| ^2}\nonumber \\&\quad \le {k^\alpha }\left( {F\left( {{x_k}} \right) - F\left( {{x^ * }} \right) } \right) - {\left( {k + 1} \right) ^\alpha }\left( {F\left( {{x_{k + 1}}} \right) - F\left( {{x^ * }} \right) } \right) \nonumber \\&\quad \quad + \frac{{1 - 2\theta }}{{2\left( {1 - \theta } \right) }}{\left( {\frac{\tau }{\varepsilon }} \right) ^{\frac{{2\left( {1 - \theta } \right) }}{{1 - 2\theta }}}}{\varPsi _k}, \end{aligned}$$
(B.5)

which means that

$$\begin{aligned}\sum \nolimits _{k = 1}^{ N } {{k^\alpha }{{\left\| {{x_{k + 1}} - {x_k}} \right\| }^2}} \le C\ln \left( {N + 1} \right) .\end{aligned}$$

Using (B.3), then

$$\begin{aligned}\sum \nolimits _{k = 1}^{ N } {{k^\alpha }{{\left( {F\left( {{x_{k + 1}}} \right) - F\left( {{x^*}} \right) } \right) }^{2\left( {1 - \theta } \right) }}} \le C\ln \left( {N + 1} \right) .\end{aligned}$$

Further, using the monotonic of \(\left\{ {F\left( {{x_k}} \right) - F\left( {{x^ * }} \right) } \right\} ,\) we obtain that

$$\begin{aligned}F\left( {{x_{N + 1}}} \right) - F\left( {{x^*}} \right) = O\left( {\root 2\left( {1 - \theta } \right) \of {{\frac{{\ln \left( {N + 1} \right) }}{{\sum \nolimits _{k = 1}^N {{k^\alpha }} }}}}} \right) = O\left( {\frac{{\root 2\left( {1 - \theta } \right) \of {{\ln \left( {N + 1} \right) }}}}{{{k^{\frac{1}{{1 - 2\theta }}}}}}} \right) \end{aligned}$$

The proof for the case that \(\theta = \frac{1}{2}\) is more simple. The inequality (B.3) now become

$$\begin{aligned} F\left( {{x_{k + 1}}} \right) - F\left( {{x^ * }} \right) \le \tau {\left\| {{x_{k + 1}} - {x_k}} \right\| ^2}. \end{aligned}$$
(B.6)

Using above inequality to (B.2), we have

$$\begin{aligned}&\left( {1 + \frac{1}{\tau }} \right) \left( {F\left( {{x_{k + 1}}} \right) - F\left( {{x^ * }} \right) } \right) \le F\left( {{x_{k + 1}}} \right) - F\left( {{x^ * }} \right) \\&\quad + \frac{1}{{2\lambda }}{\left\| {{x_{k + 1}} - {x_k}} \right\| ^2} \le F\left( {{x_k}} \right) - F\left( {{x^ * }} \right) ,\end{aligned}$$

which means that the sequence \(\left\{ {F\left( {{x_k}} \right) - F\left( {{x^ * }} \right) } \right\} \) generated by FBS is Q-linear convergent.

The point (iii) is trivially. \(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, T., Liu, H. Convergence Results of a New Monotone Inertial Forward–Backward Splitting Algorithm Under the Local Hölder Error Bound Condition. Appl Math Optim 85, 7 (2022). https://doi.org/10.1007/s00245-022-09859-y

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s00245-022-09859-y

Keywords

Mathematics Subject Classification

Navigation