Skip to main content
Log in

A Telescopic Bregmanian Proximal Gradient Method Without the Global Lipschitz Continuity Assumption

  • Published:
Journal of Optimization Theory and Applications Aims and scope Submit manuscript

Abstract

The problem of minimization of the sum of two convex functions has various theoretical and real-world applications. One of the popular methods for solving this problem is the proximal gradient method (proximal forward–backward algorithm). A very common assumption in the use of this method is that the gradient of the smooth term is globally Lipschitz continuous. However, this assumption is not always satisfied in practice, thus casting a limitation on the method. In this paper, we discuss, in a wide class of finite- and infinite-dimensional spaces, a new variant of the proximal gradient method, which does not impose the above-mentioned global Lipschitz continuity assumption. A key contribution of the method is the dependence of the iterative steps on a certain telescopic decomposition of the constraint set into subsets. Moreover, we use a Bregman divergence in the proximal forward–backward operation. Under certain practical conditions, a non-asymptotic rate of convergence (that is, in the function values) is established, as well as the weak convergence of the whole sequence to a minimizer. We also obtain a few auxiliary results of independent interest.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Bach, F., Jenatton, R., Mairal, J., Obozinski, G.: Optimization with sparsity-inducing penalties. Found. Trends Mach. Learn. 4, 1–106 (2012)

    Article  MATH  Google Scholar 

  2. Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imag. Sci. 2, 183–202 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  3. Bertero, M., Boccacci, P., Desiderà, G., Vicidomini, G.: Image deblurring with Poisson data: from cells to galaxies. Inverse Prob. 25, 123006 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  4. Combettes, P.L., Wajs, V.R.: Signal recovery by proximal forward-backward splitting. Multiscale Model. Simul. 4, 1168–1200 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  5. De Mol, C., De Vito, E., Rosasco, L.: Elastic-net regularization in learning theory. J. Complex. 25, 201–230 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  6. Figueiredo, M.A.T., Bioucas-Dias, J.M., Nowak, R.D.: Majorization-minimization algorithms for wavelet-based image restoration. IEEE Trans. Image Process. 16, 2980–2991 (2007)

    Article  MathSciNet  Google Scholar 

  7. Parikh, N., Boyd, S.: Proximal algorithms. Found. Trends Optim. 1, 127–239 (2014)

    Article  Google Scholar 

  8. Tseng, P.: Approximation accuracy, gradient methods, and error bound for structured convex optimization. Math. Program. Ser. B. 125, 263–295 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  9. Martinet, B.: Régularisation d’inéquations uariationelles par approximations successioes. Rev. Française Inf. Rech. Oper. 4, 154–158 (1970)

    MATH  Google Scholar 

  10. Rockafellar, R.T.: Monotone operators and the proximal point algorithm. SIAM J. Control Optim. 14, 877–898 (1976)

    Article  MathSciNet  MATH  Google Scholar 

  11. Bruck, R.E., Reich, S.: Nonexpansive projections and resolvents of accretive operators in Banach spaces. Houston J. Math. 32, 459–470 (1977)

    MathSciNet  MATH  Google Scholar 

  12. Passty, G.B.: Ergodic convergence to a zero of the sum of monotone operators in Hilbert space. J. Math. Anal. Appl. 72, 383–390 (1979)

    Article  MathSciNet  MATH  Google Scholar 

  13. Brézis, H., Lions, P.L.: Produits infinis de résolvantes. Israel J. Math. 29, 329–345 (1978)

    Article  MathSciNet  MATH  Google Scholar 

  14. Nevanlinna, O., Reich, S.: Strong convergence of contraction semigroups and of iterative methods for accretive operators in Banach spaces. Israel J. Math. 32, 44–58 (1979)

    Article  MathSciNet  MATH  Google Scholar 

  15. Bauschke, H.H., Bolte, J., Teboulle, M.: A descent lemma beyond Lipschitz gradient continuity: first-order methods revisited and applications. Math. Oper. Res. 42, 330–348 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  16. Markham, J., Conchello, J.A.: Fast maximum-likelihood image-restoration algorithms for three-dimensional fluorescence microscopy. J. Opt. Soc. Am. A 18, 1062–1071 (2001)

    Article  Google Scholar 

  17. Dey, N., Blanc-Feraud, L., Zimmer, C., Roux, P., Kam, Z., Olivo-Marin, J.C., Zerubia, J.: Richardson–Lucy algorithm with total variation regularization for 3D confocal microscope deconvolution. Microsc. Res. Tech. 69, 260–266 (2006)

    Article  Google Scholar 

  18. Cruz, J.Y.B., Nghia, T.T.A.: On the convergence of the forward–backward splitting method with linesearches. Optim. Methods Softw. 31, 1209–1238 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  19. Bolte, J., Sabach, S., Teboulle, M., Vaisbourd, Y.: First order methods beyond convexity and Lipschitz gradient continuity with applications to quadratic inverse problems. SIAM J. Optim. 28, 2131–2151 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  20. Cohen, G.: Auxiliary problem principle and decomposition of optimization problems. J. Optim. Theory Appl. 32, 277–305 (1980)

    Article  MathSciNet  MATH  Google Scholar 

  21. Nguyen, Q.V.: Forward-backward splitting with Bregman distances. Vietnam J. Math. 45, 519–539 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  22. Reem, D., Reich, S., De Pierro, A.: Re-examination of Bregman functions and new properties of their divergences. Optimization 68, 279–348 (2019)

    Article  MathSciNet  MATH  Google Scholar 

  23. Tseng, P.: On accelerated proximal gradient methods for convex-concave optimization (2008). Preprint. https://www.mit.edu/~dimitrib/PTseng/papers/apgm.pdf. Accessed 15 Oct 2018

  24. Nemirovski, A.: Prox-method with rate of convergence \(O(1/t)\) for variational inequalities with Lipschitz continuous monotone operators and smooth convex-concave saddle point problems. SIAM J. Optim. 15, 229–251 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  25. Bauschke, H.H., Combettes, P.L.: Convex Analysis and Monotone Operator Theory in Hilbert Spaces. CMS Books in Mathematics, 2nd edn. Springer, Cham (2017)

    Book  MATH  Google Scholar 

  26. Rockafellar, R.T.: Convex Analysis. Princeton Mathematical Series, No. 28. Princeton University Press, Princeton (1970)

    Book  MATH  Google Scholar 

  27. van Tiel, J.: Convex Analysis: An Introductory Text. Wiley, Belfast (1984)

    MATH  Google Scholar 

  28. Zălinescu, C.: Convex Analysis in General Vector Spaces. World Scientific Publishing, River Edge (2002)

    Book  MATH  Google Scholar 

  29. Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course, Applied Optimization, vol. 87. Kluwer Academic Publishers, Boston (2004)

    Book  MATH  Google Scholar 

  30. Bregman, L.M.: The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming. Comput. Math. Math. Phys. 7, 200–217 (1967)

    Article  MathSciNet  MATH  Google Scholar 

  31. Censor, Y., Lent, A.: An iterative row-action method for interval convex programming. J. Optim. Theory Appl. 34, 321–353 (1981)

    Article  MathSciNet  MATH  Google Scholar 

  32. Censor, Y., Reich, S.: Iterations of paracontractions and firmly nonexpansive operators with applications to feasibility and optimization. Optimization 37, 323–339 (1996)

    Article  MathSciNet  MATH  Google Scholar 

  33. De Pierro, A.R., Iusem, A.N.: A relaxed version of Bregman’s method for convex programming. J. Optim. Theory Appl. 51, 421–440 (1986)

    Article  MathSciNet  MATH  Google Scholar 

  34. Censor, Y., Zenios, A.: Proximal minimization algorithm with \(D\)-functions. J. Optim. Theory Appl. 73, 451–464 (1992)

    Article  MathSciNet  MATH  Google Scholar 

  35. Beck, A., Teboulle, M.: Mirror descent and nonlinear projected subgradient methods for convex optimization. Oper. Res. Lett. 31, 167–175 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  36. Butnariu, D., Iusem, A.N., Zălinescu, C.: On uniform convexity, total convexity and convergence of the proximal point and outer Bregman projection algorithms in Banach spaces. J. Convex. Anal. 10, 35–61 (2003)

    MathSciNet  MATH  Google Scholar 

  37. Chen, G., Teboulle, M.: Convergence analysis of a proximal-like minimization algorithm using Bregman functions. SIAM J. Optim. 3, 538–543 (1993)

    Article  MathSciNet  MATH  Google Scholar 

  38. Osher, S., Burger, M., Goldfarb, D., Xu, J., Yin, W.: An iterative regularization method for total variation-based image restoration. Multiscale Model. Simul. 4, 460–489 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  39. Yin, W., Osher, S., Goldfarb, D., Darbon, J.: Bregman iterative algorithms for \(\ell _1\)-minimization with applications to compressed sensing. SIAM J. Imaging Sci. 1, 143–168 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  40. Zaslavski, A.J.: Convergence of a proximal point method in the presence of computational errors in Hilbert spaces. SIAM J. Optim. 20, 2413–2421 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  41. Brezis, H.: Functional Analysis. Sobolev Spaces and Partial Differential Equations. Springer, New York (2011)

    MATH  Google Scholar 

  42. Ambrosetti, A., Prodi, G.: A Primer of Nonlinear Analysis. Cambridge University Press, New York, USA (1993)

    MATH  Google Scholar 

  43. Reem, D., Reich, S.: Solutions to inexact resolvent inclusion problems with applications to nonlinear analysis and optimization. Rend. Circ. Mat. Palermo 2(67), 337–371 (2018)

    MathSciNet  MATH  Google Scholar 

  44. Reich, S.: Nonlinear semigroups, holomorphic mappings, and integral equations. In: Proceedings of Symposia Pure Mathematics Part 2. Nonlinear functional analysis and its applications, Berkeley, California, 1983, vol. 45, pp. 307–324. American Mathematical Society, Providence (1986)

  45. Reem, D., Reich, S., De Pierro, A.: A telescopic Bregmanian proximal gradient method without the global Lipschitz continuity assumption (2019). arXiv:1804.10273 [math.OC] ([v4], 19 Mar 2019)

  46. Reem, D.: The Bregman distance without the Bregman function II. In: Reich, S., Zaslavski, A.J. (eds.) Optimization Theory and Related Topics, Contemporary Mathematics, vol. 568, pp. 213–223. American Mathematical Society, Providence (2012)

    Chapter  Google Scholar 

  47. Reem, D., Pierro, A.D.: A new convergence analysis and perturbation resilience of some accelerated proximal forward–backward algorithms with errors. Inverse Prob. 33, 044001 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  48. Phelps, R.R.: Convex Functions, Monotone Operators and Differentiability, vol. 1364, 2nd edn. Springer, Berlin (1993). Closely related material can be found in ”Lectures on maximal monotone operators”

    MATH  Google Scholar 

  49. Bauschke, H.H., Borwein, J.M., Combettes, P.L.: Essential smoothness, essential strict convexity, and Legendre functions in Banach spaces. Commun. Contemp. Math. 3, 615–647 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  50. Reem, D., Reich, S., De Pierro, A.: Stability of the optimal values under small perturbations of the constraint set. arXiv:1902.02363 [math.OC]([v1], 6 Feb 2019)

Download references

Acknowledgements

Part of the work of Daniel Reem was done when he was at the Institute of Mathematical and Computer Sciences (ICMC), University of São Paulo, São Carlos, Brazil (2014–2016), and was supported by FAPESP 2013/19504-9. It is a pleasure for him to thank Alfredo Iusem and Jose Yunier Bello Cruz for helpful discussions regarding some of the references. Simeon Reich was partially supported by the Israel Science Foundation (Grants 389/12 and 820/17), by the Fund for the Promotion of Research at the Technion and by the Technion General Research Fund. Alvaro De Pierro thanks CNPq Grant 306030/2014-4 and FAPESP 2013/19504-9. All the authors wish to express their thanks to three referees for their feedback which helped to improve the presentation of the paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Daniel Reem.

Additional information

Communicated by Hedy Attouch.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix: Some Proofs

Appendix: Some Proofs

Here, we provide the proofs of some claims mentioned earlier.

Proof of some claims mentioned in Remark 5.2

We first show that in the backtracking step size rule, if \(S_j\ne C\) for some j, then \(L_k\le \eta L(f',S_k\cap U)\) for each \(k\in \mathbb {N}\). The case \(k=1\) holds by our assumption on \(L_1\) since \(S_j\ne C\) for some \(j\in \mathbb {N}\). Let \(k\ge 2\) and suppose, by induction, that the claim holds for all natural numbers between 1 to \(k-1\). If, to the contrary, we have \(L_k>\eta L(f',S_k\cap U)\), then \(\eta ^{i_k-1}L_{k-1}>L(f',S_k\cap U)\) because \(L_k/\eta =\eta ^{i_k-1}L_{k-1}\). Hence, by using (16) with \(L:=\eta ^{i_k-1}L_{k-1}\), we conclude that (14) holds with L instead of \(L_k\), a contradiction to the minimality of \(i_k\) unless \(i_k=0\). But when \(i_k=0\), we have \(L_k=L_{k-1}\), hence, using the induction hypothesis and the fact that \(L(f',S_{k-1}\cap U)\le L(f',S_{k}\cap U)\) (this latter fact follows immediately from the equality \(L(f',S_k\cap U):=\sup \{\Vert f'(x)-f'(y)\Vert /\Vert x-y\Vert : x,y\in S_k\cap U, x\ne y\}\) and the assumption \(S_{k-1}\subseteq S_k\)), we have \(L_k=L_{k-1}\le \eta L(f',S_{k-1}\cap U)\le \eta L(f',S_k\cap U)\), a contradiction to the assumption on \(L_k\).

Now, we show that if we are in the backtracking step size rule, and we also have \(S_k=C\) for all \(k\in \mathbb {N}\) and \(L_1>\eta L(f',S_1\cap U)\), then \(L_{k+1}=L_k\) for each \(k\in \mathbb {N}\). Indeed, since \(S_1=C\) and \(L_{k+1}\ge L_k\) for each \(k\in \mathbb {N}\) (as shown in Remark 5.2), we have \(L_k>\eta L(f',C\cap U)\) for all \(k\in \mathbb {N}\). If we do not have \(L_{k+1}=L_k\) for each \(k\in \mathbb {N}\), then \(i_{k+1}>0\) for some \(k\in \mathbb {N}\) and for this k we have \(\eta ^{i_{k+1}-1}L_{k}=L_{k+1}/\eta >L(f',C\cap U)\). Thus, (16) with \(L:=\eta ^{i_{k+1}-1}L_{k}\) implies that (14) holds with L instead of \(L_k\), a contradiction to the minimality of \(i_{k+1}\). \(\square \)

Proof of Lemma 8.1

Since \((x_k)_{k=1}^{\infty }\) is a bounded sequence in a reflexive Banach space, a well-known classical result implies that \((x_k)_{k=1}^{\infty }\) has at least one weak cluster point \(q\in X\), which, by our assumption, is in U. Suppose to the contrary that there are at least two different weak cluster points \(q_1:=w\)-\(\lim _{k\rightarrow \infty , k\in N_1}x_k\) and \(q_2:=w\)-\(\lim _{k\rightarrow \infty , k\in N_2}x_k\) in X, where \(N_1\) and \(N_2\) are two infinite subsets of \(\mathbb {N}\). By our assumption, \(q_1,q_2\in U\), and hence, since b satisfies the limiting difference property, we have

$$\begin{aligned} B(q_2,q_1)=\lim _{k\rightarrow \infty , k\in N_1}(B(q_2,x_k)-B(q_1,x_k)) \end{aligned}$$
(40a)

and

$$\begin{aligned} B(q_1,q_2)=\lim _{k\rightarrow \infty , k\in N_2}(B(q_1,x_k)-B(q_2,x_k)). \end{aligned}$$
(40b)

Since we assume that \(L_1:=\lim _{k\rightarrow \infty }B(q_1,x_k)\) and \(L_2:=\lim _{k\rightarrow \infty }B(q_2,x_k)\) exist and are finite, we conclude from (40) that \(B(q_2,q_1)=L_2-L_1=-(L_1-L_2)=-B(q_1,q_2)\). The assumptions on b imply that B is nonnegative (see, for example, [22, Proposition 4.13(III)]), and hence \(0\le B(q_2,q_1)=-B(q_1,q_2)\le 0\). Thus, \(B(q_1,q_2)=B(q_2,q_1)=0\). Since b is strictly convex on U, for all \(z_1\in \text {dom}(b)\) and \(z_2\in U\), we have \(B(z_1,z_2)=0\) if and only if \(z_1=z_2\) (see, for example, [22, Proposition 4.13(III)]). Hence \(q_1=q_2\), a contradiction to the initial assumption. Thus, all the weak cluster points of \((x_k)_{k=1}^{\infty }\) coincide.

We claim that \((x_k)_{k=1}^{\infty }\) converges weakly to the unique cluster point q. Indeed, otherwise there are a weak neighborhood V of q and a subsequence \((x_{k_j})_{j=1}^\infty \) of \((x_k)_{k=1}^\infty \) which is located outside V. But this subsequence is a bounded sequence in a reflexive Banach space (since \((x_k)_{k=1}^{\infty }\) is bounded), and hence, it has a subsequence which converges weakly, as proved above, to q. Thus, infinitely many elements of \((x_{k_j})_{j=1}^{\infty }\) are in V, a contradiction. \(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Reem, D., Reich, S. & De Pierro, A. A Telescopic Bregmanian Proximal Gradient Method Without the Global Lipschitz Continuity Assumption. J Optim Theory Appl 182, 851–884 (2019). https://doi.org/10.1007/s10957-019-01509-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10957-019-01509-8

Keywords

Mathematics Subject Classification

Navigation