A Telescopic Bregmanian Proximal Gradient Method Without the Global Lipschitz Continuity Assumption

Reem, Daniel; Reich, Simeon; De Pierro, Alvaro

doi:10.1007/s10957-019-01509-8

A Telescopic Bregmanian Proximal Gradient Method Without the Global Lipschitz Continuity Assumption

Published: 25 March 2019

Volume 182, pages 851–884, (2019)
Cite this article

Journal of Optimization Theory and Applications Aims and scope Submit manuscript

517 Accesses
8 Citations
2 Altmetric
Explore all metrics

Abstract

The problem of minimization of the sum of two convex functions has various theoretical and real-world applications. One of the popular methods for solving this problem is the proximal gradient method (proximal forward–backward algorithm). A very common assumption in the use of this method is that the gradient of the smooth term is globally Lipschitz continuous. However, this assumption is not always satisfied in practice, thus casting a limitation on the method. In this paper, we discuss, in a wide class of finite- and infinite-dimensional spaces, a new variant of the proximal gradient method, which does not impose the above-mentioned global Lipschitz continuity assumption. A key contribution of the method is the dependence of the iterative steps on a certain telescopic decomposition of the constraint set into subsets. Moreover, we use a Bregman divergence in the proximal forward–backward operation. Under certain practical conditions, a non-asymptotic rate of convergence (that is, in the function values) is established, as well as the weak convergence of the whole sequence to a minimizer. We also obtain a few auxiliary results of independent interest.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Generalized Nesterov’s accelerated proximal gradient algorithms with convergence rate of order o(1/k2)

Article 22 August 2022

Convergence Properties of Monotone and Nonmonotone Proximal Gradient Methods Revisited

Article Open access 29 September 2022

The generalized proximal point algorithm with step size 2 is not necessarily convergent

Article 03 March 2018

References

Bach, F., Jenatton, R., Mairal, J., Obozinski, G.: Optimization with sparsity-inducing penalties. Found. Trends Mach. Learn. 4, 1–106 (2012)
Article MATH Google Scholar
Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imag. Sci. 2, 183–202 (2009)
Article MathSciNet MATH Google Scholar
Bertero, M., Boccacci, P., Desiderà, G., Vicidomini, G.: Image deblurring with Poisson data: from cells to galaxies. Inverse Prob. 25, 123006 (2009)
Article MathSciNet MATH Google Scholar
Combettes, P.L., Wajs, V.R.: Signal recovery by proximal forward-backward splitting. Multiscale Model. Simul. 4, 1168–1200 (2005)
Article MathSciNet MATH Google Scholar
De Mol, C., De Vito, E., Rosasco, L.: Elastic-net regularization in learning theory. J. Complex. 25, 201–230 (2009)
Article MathSciNet MATH Google Scholar
Figueiredo, M.A.T., Bioucas-Dias, J.M., Nowak, R.D.: Majorization-minimization algorithms for wavelet-based image restoration. IEEE Trans. Image Process. 16, 2980–2991 (2007)
Article MathSciNet Google Scholar
Parikh, N., Boyd, S.: Proximal algorithms. Found. Trends Optim. 1, 127–239 (2014)
Article Google Scholar
Tseng, P.: Approximation accuracy, gradient methods, and error bound for structured convex optimization. Math. Program. Ser. B. 125, 263–295 (2010)
Article MathSciNet MATH Google Scholar
Martinet, B.: Régularisation d’inéquations uariationelles par approximations successioes. Rev. Française Inf. Rech. Oper. 4, 154–158 (1970)
MATH Google Scholar
Rockafellar, R.T.: Monotone operators and the proximal point algorithm. SIAM J. Control Optim. 14, 877–898 (1976)
Article MathSciNet MATH Google Scholar
Bruck, R.E., Reich, S.: Nonexpansive projections and resolvents of accretive operators in Banach spaces. Houston J. Math. 32, 459–470 (1977)
MathSciNet MATH Google Scholar
Passty, G.B.: Ergodic convergence to a zero of the sum of monotone operators in Hilbert space. J. Math. Anal. Appl. 72, 383–390 (1979)
Article MathSciNet MATH Google Scholar
Brézis, H., Lions, P.L.: Produits infinis de résolvantes. Israel J. Math. 29, 329–345 (1978)
Article MathSciNet MATH Google Scholar
Nevanlinna, O., Reich, S.: Strong convergence of contraction semigroups and of iterative methods for accretive operators in Banach spaces. Israel J. Math. 32, 44–58 (1979)
Article MathSciNet MATH Google Scholar
Bauschke, H.H., Bolte, J., Teboulle, M.: A descent lemma beyond Lipschitz gradient continuity: first-order methods revisited and applications. Math. Oper. Res. 42, 330–348 (2017)
Article MathSciNet MATH Google Scholar
Markham, J., Conchello, J.A.: Fast maximum-likelihood image-restoration algorithms for three-dimensional fluorescence microscopy. J. Opt. Soc. Am. A 18, 1062–1071 (2001)
Article Google Scholar
Dey, N., Blanc-Feraud, L., Zimmer, C., Roux, P., Kam, Z., Olivo-Marin, J.C., Zerubia, J.: Richardson–Lucy algorithm with total variation regularization for 3D confocal microscope deconvolution. Microsc. Res. Tech. 69, 260–266 (2006)
Article Google Scholar
Cruz, J.Y.B., Nghia, T.T.A.: On the convergence of the forward–backward splitting method with linesearches. Optim. Methods Softw. 31, 1209–1238 (2016)
Article MathSciNet MATH Google Scholar
Bolte, J., Sabach, S., Teboulle, M., Vaisbourd, Y.: First order methods beyond convexity and Lipschitz gradient continuity with applications to quadratic inverse problems. SIAM J. Optim. 28, 2131–2151 (2018)
Article MathSciNet MATH Google Scholar
Cohen, G.: Auxiliary problem principle and decomposition of optimization problems. J. Optim. Theory Appl. 32, 277–305 (1980)
Article MathSciNet MATH Google Scholar
Nguyen, Q.V.: Forward-backward splitting with Bregman distances. Vietnam J. Math. 45, 519–539 (2017)
Article MathSciNet MATH Google Scholar
Reem, D., Reich, S., De Pierro, A.: Re-examination of Bregman functions and new properties of their divergences. Optimization 68, 279–348 (2019)
Article MathSciNet MATH Google Scholar
Tseng, P.: On accelerated proximal gradient methods for convex-concave optimization (2008). Preprint. https://www.mit.edu/~dimitrib/PTseng/papers/apgm.pdf. Accessed 15 Oct 2018
Nemirovski, A.: Prox-method with rate of convergence $O(1/t)$ for variational inequalities with Lipschitz continuous monotone operators and smooth convex-concave saddle point problems. SIAM J. Optim. 15, 229–251 (2004)
Article MathSciNet MATH Google Scholar
Bauschke, H.H., Combettes, P.L.: Convex Analysis and Monotone Operator Theory in Hilbert Spaces. CMS Books in Mathematics, 2nd edn. Springer, Cham (2017)
Book MATH Google Scholar
Rockafellar, R.T.: Convex Analysis. Princeton Mathematical Series, No. 28. Princeton University Press, Princeton (1970)
Book MATH Google Scholar
van Tiel, J.: Convex Analysis: An Introductory Text. Wiley, Belfast (1984)
MATH Google Scholar
Zălinescu, C.: Convex Analysis in General Vector Spaces. World Scientific Publishing, River Edge (2002)
Book MATH Google Scholar
Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course, Applied Optimization, vol. 87. Kluwer Academic Publishers, Boston (2004)
Book MATH Google Scholar
Bregman, L.M.: The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming. Comput. Math. Math. Phys. 7, 200–217 (1967)
Article MathSciNet MATH Google Scholar
Censor, Y., Lent, A.: An iterative row-action method for interval convex programming. J. Optim. Theory Appl. 34, 321–353 (1981)
Article MathSciNet MATH Google Scholar
Censor, Y., Reich, S.: Iterations of paracontractions and firmly nonexpansive operators with applications to feasibility and optimization. Optimization 37, 323–339 (1996)
Article MathSciNet MATH Google Scholar
De Pierro, A.R., Iusem, A.N.: A relaxed version of Bregman’s method for convex programming. J. Optim. Theory Appl. 51, 421–440 (1986)
Article MathSciNet MATH Google Scholar
Censor, Y., Zenios, A.: Proximal minimization algorithm with $D$-functions. J. Optim. Theory Appl. 73, 451–464 (1992)
Article MathSciNet MATH Google Scholar
Beck, A., Teboulle, M.: Mirror descent and nonlinear projected subgradient methods for convex optimization. Oper. Res. Lett. 31, 167–175 (2003)
Article MathSciNet MATH Google Scholar
Butnariu, D., Iusem, A.N., Zălinescu, C.: On uniform convexity, total convexity and convergence of the proximal point and outer Bregman projection algorithms in Banach spaces. J. Convex. Anal. 10, 35–61 (2003)
MathSciNet MATH Google Scholar
Chen, G., Teboulle, M.: Convergence analysis of a proximal-like minimization algorithm using Bregman functions. SIAM J. Optim. 3, 538–543 (1993)
Article MathSciNet MATH Google Scholar
Osher, S., Burger, M., Goldfarb, D., Xu, J., Yin, W.: An iterative regularization method for total variation-based image restoration. Multiscale Model. Simul. 4, 460–489 (2005)
Article MathSciNet MATH Google Scholar
Yin, W., Osher, S., Goldfarb, D., Darbon, J.: Bregman iterative algorithms for $\ell _1$-minimization with applications to compressed sensing. SIAM J. Imaging Sci. 1, 143–168 (2008)
Article MathSciNet MATH Google Scholar
Zaslavski, A.J.: Convergence of a proximal point method in the presence of computational errors in Hilbert spaces. SIAM J. Optim. 20, 2413–2421 (2010)
Article MathSciNet MATH Google Scholar
Brezis, H.: Functional Analysis. Sobolev Spaces and Partial Differential Equations. Springer, New York (2011)
MATH Google Scholar
Ambrosetti, A., Prodi, G.: A Primer of Nonlinear Analysis. Cambridge University Press, New York, USA (1993)
MATH Google Scholar
Reem, D., Reich, S.: Solutions to inexact resolvent inclusion problems with applications to nonlinear analysis and optimization. Rend. Circ. Mat. Palermo 2(67), 337–371 (2018)
MathSciNet MATH Google Scholar
Reich, S.: Nonlinear semigroups, holomorphic mappings, and integral equations. In: Proceedings of Symposia Pure Mathematics Part 2. Nonlinear functional analysis and its applications, Berkeley, California, 1983, vol. 45, pp. 307–324. American Mathematical Society, Providence (1986)
Reem, D., Reich, S., De Pierro, A.: A telescopic Bregmanian proximal gradient method without the global Lipschitz continuity assumption (2019). arXiv:1804.10273 [math.OC] ([v4], 19 Mar 2019)
Reem, D.: The Bregman distance without the Bregman function II. In: Reich, S., Zaslavski, A.J. (eds.) Optimization Theory and Related Topics, Contemporary Mathematics, vol. 568, pp. 213–223. American Mathematical Society, Providence (2012)
Chapter Google Scholar
Reem, D., Pierro, A.D.: A new convergence analysis and perturbation resilience of some accelerated proximal forward–backward algorithms with errors. Inverse Prob. 33, 044001 (2017)
Article MathSciNet MATH Google Scholar
Phelps, R.R.: Convex Functions, Monotone Operators and Differentiability, vol. 1364, 2nd edn. Springer, Berlin (1993). Closely related material can be found in ”Lectures on maximal monotone operators”
MATH Google Scholar
Bauschke, H.H., Borwein, J.M., Combettes, P.L.: Essential smoothness, essential strict convexity, and Legendre functions in Banach spaces. Commun. Contemp. Math. 3, 615–647 (2001)
Article MathSciNet MATH Google Scholar
Reem, D., Reich, S., De Pierro, A.: Stability of the optimal values under small perturbations of the constraint set. arXiv:1902.02363 [math.OC]([v1], 6 Feb 2019)

Download references

Acknowledgements

Part of the work of Daniel Reem was done when he was at the Institute of Mathematical and Computer Sciences (ICMC), University of São Paulo, São Carlos, Brazil (2014–2016), and was supported by FAPESP 2013/19504-9. It is a pleasure for him to thank Alfredo Iusem and Jose Yunier Bello Cruz for helpful discussions regarding some of the references. Simeon Reich was partially supported by the Israel Science Foundation (Grants 389/12 and 820/17), by the Fund for the Promotion of Research at the Technion and by the Technion General Research Fund. Alvaro De Pierro thanks CNPq Grant 306030/2014-4 and FAPESP 2013/19504-9. All the authors wish to express their thanks to three referees for their feedback which helped to improve the presentation of the paper.

Author information

Authors and Affiliations

Department of Mathematics, The Technion - Israel Institute of Technology, 3200003, Haifa, Israel
Daniel Reem & Simeon Reich
CNPq, Campinas, Brazil
Alvaro De Pierro

Authors

Daniel Reem
View author publications
You can also search for this author in PubMed Google Scholar
Simeon Reich
View author publications
You can also search for this author in PubMed Google Scholar
Alvaro De Pierro
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Daniel Reem.

Additional information

Communicated by Hedy Attouch.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix: Some Proofs

Here, we provide the proofs of some claims mentioned earlier.

Proof of some claims mentioned in Remark 5.2

We first show that in the backtracking step size rule, if $S_j\ne C$ for some j, then $L_k\le \eta L(f',S_k\cap U)$ for each $k\in \mathbb {N}$. The case $k=1$ holds by our assumption on $L_1$ since $S_j\ne C$ for some $j\in \mathbb {N}$. Let $k\ge 2$ and suppose, by induction, that the claim holds for all natural numbers between 1 to $k-1$. If, to the contrary, we have $L_k>\eta L(f',S_k\cap U)$, then $\eta ^{i_k-1}L_{k-1}>L(f',S_k\cap U)$ because $L_k/\eta =\eta ^{i_k-1}L_{k-1}$. Hence, by using (16) with $L:=\eta ^{i_k-1}L_{k-1}$, we conclude that (14) holds with L instead of $L_k$, a contradiction to the minimality of $i_k$ unless $i_k=0$. But when $i_k=0$, we have $L_k=L_{k-1}$, hence, using the induction hypothesis and the fact that $L(f',S_{k-1}\cap U)\le L(f',S_{k}\cap U)$ (this latter fact follows immediately from the equality $L(f',S_k\cap U):=\sup \{\Vert f'(x)-f'(y)\Vert /\Vert x-y\Vert : x,y\in S_k\cap U, x\ne y\}$ and the assumption $S_{k-1}\subseteq S_k$), we have $L_k=L_{k-1}\le \eta L(f',S_{k-1}\cap U)\le \eta L(f',S_k\cap U)$, a contradiction to the assumption on $L_k$.

Now, we show that if we are in the backtracking step size rule, and we also have $S_k=C$ for all $k\in \mathbb {N}$ and $L_1>\eta L(f',S_1\cap U)$, then $L_{k+1}=L_k$ for each $k\in \mathbb {N}$. Indeed, since $S_1=C$ and $L_{k+1}\ge L_k$ for each $k\in \mathbb {N}$ (as shown in Remark 5.2), we have $L_k>\eta L(f',C\cap U)$ for all $k\in \mathbb {N}$. If we do not have $L_{k+1}=L_k$ for each $k\in \mathbb {N}$, then $i_{k+1}>0$ for some $k\in \mathbb {N}$ and for this k we have $\eta ^{i_{k+1}-1}L_{k}=L_{k+1}/\eta >L(f',C\cap U)$. Thus, (16) with $L:=\eta ^{i_{k+1}-1}L_{k}$ implies that (14) holds with L instead of $L_k$, a contradiction to the minimality of $i_{k+1}$. $\square $

Proof of Lemma 8.1

Since $(x_k)_{k=1}^{\infty }$ is a bounded sequence in a reflexive Banach space, a well-known classical result implies that $(x_k)_{k=1}^{\infty }$ has at least one weak cluster point $q\in X$, which, by our assumption, is in U. Suppose to the contrary that there are at least two different weak cluster points $q_1:=w$-$\lim _{k\rightarrow \infty , k\in N_1}x_k$ and $q_2:=w$-$\lim _{k\rightarrow \infty , k\in N_2}x_k$ in X, where $N_1$ and $N_2$ are two infinite subsets of $\mathbb {N}$. By our assumption, $q_1,q_2\in U$, and hence, since b satisfies the limiting difference property, we have

$$\begin{aligned} B(q_2,q_1)=\lim _{k\rightarrow \infty , k\in N_1}(B(q_2,x_k)-B(q_1,x_k)) \end{aligned}$$

(40a)

and

$$\begin{aligned} B(q_1,q_2)=\lim _{k\rightarrow \infty , k\in N_2}(B(q_1,x_k)-B(q_2,x_k)). \end{aligned}$$

(40b)

Since we assume that $L_1:=\lim _{k\rightarrow \infty }B(q_1,x_k)$ and $L_2:=\lim _{k\rightarrow \infty }B(q_2,x_k)$ exist and are finite, we conclude from (40) that $B(q_2,q_1)=L_2-L_1=-(L_1-L_2)=-B(q_1,q_2)$. The assumptions on b imply that B is nonnegative (see, for example, [22, Proposition 4.13(III)]), and hence $0\le B(q_2,q_1)=-B(q_1,q_2)\le 0$. Thus, $B(q_1,q_2)=B(q_2,q_1)=0$. Since b is strictly convex on U, for all $z_1\in \text {dom}(b)$ and $z_2\in U$, we have $B(z_1,z_2)=0$ if and only if $z_1=z_2$ (see, for example, [22, Proposition 4.13(III)]). Hence $q_1=q_2$, a contradiction to the initial assumption. Thus, all the weak cluster points of $(x_k)_{k=1}^{\infty }$ coincide.

We claim that $(x_k)_{k=1}^{\infty }$ converges weakly to the unique cluster point q. Indeed, otherwise there are a weak neighborhood V of q and a subsequence $(x_{k_j})_{j=1}^\infty $ of $(x_k)_{k=1}^\infty $ which is located outside V. But this subsequence is a bounded sequence in a reflexive Banach space (since $(x_k)_{k=1}^{\infty }$ is bounded), and hence, it has a subsequence which converges weakly, as proved above, to q. Thus, infinitely many elements of $(x_{k_j})_{j=1}^{\infty }$ are in V, a contradiction. $\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Reem, D., Reich, S. & De Pierro, A. A Telescopic Bregmanian Proximal Gradient Method Without the Global Lipschitz Continuity Assumption. J Optim Theory Appl 182, 851–884 (2019). https://doi.org/10.1007/s10957-019-01509-8

Download citation

Received: 19 April 2018
Accepted: 13 March 2019
Published: 25 March 2019
Issue Date: 15 September 2019
DOI: https://doi.org/10.1007/s10957-019-01509-8

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Telescopic Bregmanian Proximal Gradient Method Without the Global Lipschitz Continuity Assumption

Abstract

Access this article

Similar content being viewed by others

Generalized Nesterov’s accelerated proximal gradient algorithms with convergence rate of order o(1/k2)

Convergence Properties of Monotone and Nonmonotone Proximal Gradient Methods Revisited

The generalized proximal point algorithm with step size 2 is not necessarily convergent

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix: Some Proofs

Proof of some claims mentioned in Remark 5.2

Proof of Lemma 8.1

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

A Telescopic Bregmanian Proximal Gradient Method Without the Global Lipschitz Continuity Assumption

Abstract

Access this article

Similar content being viewed by others

Generalized Nesterov’s accelerated proximal gradient algorithms with convergence rate of order o(1/k2)

Convergence Properties of Monotone and Nonmonotone Proximal Gradient Methods Revisited

The generalized proximal point algorithm with step size 2 is not necessarily convergent

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix: Some Proofs

Appendix: Some Proofs

Proof of some claims mentioned in Remark 5.2

Proof of Lemma 8.1

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation