Abstract
The problem of minimization of the sum of two convex functions has various theoretical and real-world applications. One of the popular methods for solving this problem is the proximal gradient method (proximal forward–backward algorithm). A very common assumption in the use of this method is that the gradient of the smooth term is globally Lipschitz continuous. However, this assumption is not always satisfied in practice, thus casting a limitation on the method. In this paper, we discuss, in a wide class of finite- and infinite-dimensional spaces, a new variant of the proximal gradient method, which does not impose the above-mentioned global Lipschitz continuity assumption. A key contribution of the method is the dependence of the iterative steps on a certain telescopic decomposition of the constraint set into subsets. Moreover, we use a Bregman divergence in the proximal forward–backward operation. Under certain practical conditions, a non-asymptotic rate of convergence (that is, in the function values) is established, as well as the weak convergence of the whole sequence to a minimizer. We also obtain a few auxiliary results of independent interest.
Similar content being viewed by others
References
Bach, F., Jenatton, R., Mairal, J., Obozinski, G.: Optimization with sparsity-inducing penalties. Found. Trends Mach. Learn. 4, 1–106 (2012)
Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imag. Sci. 2, 183–202 (2009)
Bertero, M., Boccacci, P., Desiderà, G., Vicidomini, G.: Image deblurring with Poisson data: from cells to galaxies. Inverse Prob. 25, 123006 (2009)
Combettes, P.L., Wajs, V.R.: Signal recovery by proximal forward-backward splitting. Multiscale Model. Simul. 4, 1168–1200 (2005)
De Mol, C., De Vito, E., Rosasco, L.: Elastic-net regularization in learning theory. J. Complex. 25, 201–230 (2009)
Figueiredo, M.A.T., Bioucas-Dias, J.M., Nowak, R.D.: Majorization-minimization algorithms for wavelet-based image restoration. IEEE Trans. Image Process. 16, 2980–2991 (2007)
Parikh, N., Boyd, S.: Proximal algorithms. Found. Trends Optim. 1, 127–239 (2014)
Tseng, P.: Approximation accuracy, gradient methods, and error bound for structured convex optimization. Math. Program. Ser. B. 125, 263–295 (2010)
Martinet, B.: Régularisation d’inéquations uariationelles par approximations successioes. Rev. Française Inf. Rech. Oper. 4, 154–158 (1970)
Rockafellar, R.T.: Monotone operators and the proximal point algorithm. SIAM J. Control Optim. 14, 877–898 (1976)
Bruck, R.E., Reich, S.: Nonexpansive projections and resolvents of accretive operators in Banach spaces. Houston J. Math. 32, 459–470 (1977)
Passty, G.B.: Ergodic convergence to a zero of the sum of monotone operators in Hilbert space. J. Math. Anal. Appl. 72, 383–390 (1979)
Brézis, H., Lions, P.L.: Produits infinis de résolvantes. Israel J. Math. 29, 329–345 (1978)
Nevanlinna, O., Reich, S.: Strong convergence of contraction semigroups and of iterative methods for accretive operators in Banach spaces. Israel J. Math. 32, 44–58 (1979)
Bauschke, H.H., Bolte, J., Teboulle, M.: A descent lemma beyond Lipschitz gradient continuity: first-order methods revisited and applications. Math. Oper. Res. 42, 330–348 (2017)
Markham, J., Conchello, J.A.: Fast maximum-likelihood image-restoration algorithms for three-dimensional fluorescence microscopy. J. Opt. Soc. Am. A 18, 1062–1071 (2001)
Dey, N., Blanc-Feraud, L., Zimmer, C., Roux, P., Kam, Z., Olivo-Marin, J.C., Zerubia, J.: Richardson–Lucy algorithm with total variation regularization for 3D confocal microscope deconvolution. Microsc. Res. Tech. 69, 260–266 (2006)
Cruz, J.Y.B., Nghia, T.T.A.: On the convergence of the forward–backward splitting method with linesearches. Optim. Methods Softw. 31, 1209–1238 (2016)
Bolte, J., Sabach, S., Teboulle, M., Vaisbourd, Y.: First order methods beyond convexity and Lipschitz gradient continuity with applications to quadratic inverse problems. SIAM J. Optim. 28, 2131–2151 (2018)
Cohen, G.: Auxiliary problem principle and decomposition of optimization problems. J. Optim. Theory Appl. 32, 277–305 (1980)
Nguyen, Q.V.: Forward-backward splitting with Bregman distances. Vietnam J. Math. 45, 519–539 (2017)
Reem, D., Reich, S., De Pierro, A.: Re-examination of Bregman functions and new properties of their divergences. Optimization 68, 279–348 (2019)
Tseng, P.: On accelerated proximal gradient methods for convex-concave optimization (2008). Preprint. https://www.mit.edu/~dimitrib/PTseng/papers/apgm.pdf. Accessed 15 Oct 2018
Nemirovski, A.: Prox-method with rate of convergence \(O(1/t)\) for variational inequalities with Lipschitz continuous monotone operators and smooth convex-concave saddle point problems. SIAM J. Optim. 15, 229–251 (2004)
Bauschke, H.H., Combettes, P.L.: Convex Analysis and Monotone Operator Theory in Hilbert Spaces. CMS Books in Mathematics, 2nd edn. Springer, Cham (2017)
Rockafellar, R.T.: Convex Analysis. Princeton Mathematical Series, No. 28. Princeton University Press, Princeton (1970)
van Tiel, J.: Convex Analysis: An Introductory Text. Wiley, Belfast (1984)
Zălinescu, C.: Convex Analysis in General Vector Spaces. World Scientific Publishing, River Edge (2002)
Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course, Applied Optimization, vol. 87. Kluwer Academic Publishers, Boston (2004)
Bregman, L.M.: The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming. Comput. Math. Math. Phys. 7, 200–217 (1967)
Censor, Y., Lent, A.: An iterative row-action method for interval convex programming. J. Optim. Theory Appl. 34, 321–353 (1981)
Censor, Y., Reich, S.: Iterations of paracontractions and firmly nonexpansive operators with applications to feasibility and optimization. Optimization 37, 323–339 (1996)
De Pierro, A.R., Iusem, A.N.: A relaxed version of Bregman’s method for convex programming. J. Optim. Theory Appl. 51, 421–440 (1986)
Censor, Y., Zenios, A.: Proximal minimization algorithm with \(D\)-functions. J. Optim. Theory Appl. 73, 451–464 (1992)
Beck, A., Teboulle, M.: Mirror descent and nonlinear projected subgradient methods for convex optimization. Oper. Res. Lett. 31, 167–175 (2003)
Butnariu, D., Iusem, A.N., Zălinescu, C.: On uniform convexity, total convexity and convergence of the proximal point and outer Bregman projection algorithms in Banach spaces. J. Convex. Anal. 10, 35–61 (2003)
Chen, G., Teboulle, M.: Convergence analysis of a proximal-like minimization algorithm using Bregman functions. SIAM J. Optim. 3, 538–543 (1993)
Osher, S., Burger, M., Goldfarb, D., Xu, J., Yin, W.: An iterative regularization method for total variation-based image restoration. Multiscale Model. Simul. 4, 460–489 (2005)
Yin, W., Osher, S., Goldfarb, D., Darbon, J.: Bregman iterative algorithms for \(\ell _1\)-minimization with applications to compressed sensing. SIAM J. Imaging Sci. 1, 143–168 (2008)
Zaslavski, A.J.: Convergence of a proximal point method in the presence of computational errors in Hilbert spaces. SIAM J. Optim. 20, 2413–2421 (2010)
Brezis, H.: Functional Analysis. Sobolev Spaces and Partial Differential Equations. Springer, New York (2011)
Ambrosetti, A., Prodi, G.: A Primer of Nonlinear Analysis. Cambridge University Press, New York, USA (1993)
Reem, D., Reich, S.: Solutions to inexact resolvent inclusion problems with applications to nonlinear analysis and optimization. Rend. Circ. Mat. Palermo 2(67), 337–371 (2018)
Reich, S.: Nonlinear semigroups, holomorphic mappings, and integral equations. In: Proceedings of Symposia Pure Mathematics Part 2. Nonlinear functional analysis and its applications, Berkeley, California, 1983, vol. 45, pp. 307–324. American Mathematical Society, Providence (1986)
Reem, D., Reich, S., De Pierro, A.: A telescopic Bregmanian proximal gradient method without the global Lipschitz continuity assumption (2019). arXiv:1804.10273 [math.OC] ([v4], 19 Mar 2019)
Reem, D.: The Bregman distance without the Bregman function II. In: Reich, S., Zaslavski, A.J. (eds.) Optimization Theory and Related Topics, Contemporary Mathematics, vol. 568, pp. 213–223. American Mathematical Society, Providence (2012)
Reem, D., Pierro, A.D.: A new convergence analysis and perturbation resilience of some accelerated proximal forward–backward algorithms with errors. Inverse Prob. 33, 044001 (2017)
Phelps, R.R.: Convex Functions, Monotone Operators and Differentiability, vol. 1364, 2nd edn. Springer, Berlin (1993). Closely related material can be found in ”Lectures on maximal monotone operators”
Bauschke, H.H., Borwein, J.M., Combettes, P.L.: Essential smoothness, essential strict convexity, and Legendre functions in Banach spaces. Commun. Contemp. Math. 3, 615–647 (2001)
Reem, D., Reich, S., De Pierro, A.: Stability of the optimal values under small perturbations of the constraint set. arXiv:1902.02363 [math.OC]([v1], 6 Feb 2019)
Acknowledgements
Part of the work of Daniel Reem was done when he was at the Institute of Mathematical and Computer Sciences (ICMC), University of São Paulo, São Carlos, Brazil (2014–2016), and was supported by FAPESP 2013/19504-9. It is a pleasure for him to thank Alfredo Iusem and Jose Yunier Bello Cruz for helpful discussions regarding some of the references. Simeon Reich was partially supported by the Israel Science Foundation (Grants 389/12 and 820/17), by the Fund for the Promotion of Research at the Technion and by the Technion General Research Fund. Alvaro De Pierro thanks CNPq Grant 306030/2014-4 and FAPESP 2013/19504-9. All the authors wish to express their thanks to three referees for their feedback which helped to improve the presentation of the paper.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Hedy Attouch.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix: Some Proofs
Appendix: Some Proofs
Here, we provide the proofs of some claims mentioned earlier.
Proof of some claims mentioned in Remark 5.2
We first show that in the backtracking step size rule, if \(S_j\ne C\) for some j, then \(L_k\le \eta L(f',S_k\cap U)\) for each \(k\in \mathbb {N}\). The case \(k=1\) holds by our assumption on \(L_1\) since \(S_j\ne C\) for some \(j\in \mathbb {N}\). Let \(k\ge 2\) and suppose, by induction, that the claim holds for all natural numbers between 1 to \(k-1\). If, to the contrary, we have \(L_k>\eta L(f',S_k\cap U)\), then \(\eta ^{i_k-1}L_{k-1}>L(f',S_k\cap U)\) because \(L_k/\eta =\eta ^{i_k-1}L_{k-1}\). Hence, by using (16) with \(L:=\eta ^{i_k-1}L_{k-1}\), we conclude that (14) holds with L instead of \(L_k\), a contradiction to the minimality of \(i_k\) unless \(i_k=0\). But when \(i_k=0\), we have \(L_k=L_{k-1}\), hence, using the induction hypothesis and the fact that \(L(f',S_{k-1}\cap U)\le L(f',S_{k}\cap U)\) (this latter fact follows immediately from the equality \(L(f',S_k\cap U):=\sup \{\Vert f'(x)-f'(y)\Vert /\Vert x-y\Vert : x,y\in S_k\cap U, x\ne y\}\) and the assumption \(S_{k-1}\subseteq S_k\)), we have \(L_k=L_{k-1}\le \eta L(f',S_{k-1}\cap U)\le \eta L(f',S_k\cap U)\), a contradiction to the assumption on \(L_k\).
Now, we show that if we are in the backtracking step size rule, and we also have \(S_k=C\) for all \(k\in \mathbb {N}\) and \(L_1>\eta L(f',S_1\cap U)\), then \(L_{k+1}=L_k\) for each \(k\in \mathbb {N}\). Indeed, since \(S_1=C\) and \(L_{k+1}\ge L_k\) for each \(k\in \mathbb {N}\) (as shown in Remark 5.2), we have \(L_k>\eta L(f',C\cap U)\) for all \(k\in \mathbb {N}\). If we do not have \(L_{k+1}=L_k\) for each \(k\in \mathbb {N}\), then \(i_{k+1}>0\) for some \(k\in \mathbb {N}\) and for this k we have \(\eta ^{i_{k+1}-1}L_{k}=L_{k+1}/\eta >L(f',C\cap U)\). Thus, (16) with \(L:=\eta ^{i_{k+1}-1}L_{k}\) implies that (14) holds with L instead of \(L_k\), a contradiction to the minimality of \(i_{k+1}\). \(\square \)
Proof of Lemma 8.1
Since \((x_k)_{k=1}^{\infty }\) is a bounded sequence in a reflexive Banach space, a well-known classical result implies that \((x_k)_{k=1}^{\infty }\) has at least one weak cluster point \(q\in X\), which, by our assumption, is in U. Suppose to the contrary that there are at least two different weak cluster points \(q_1:=w\)-\(\lim _{k\rightarrow \infty , k\in N_1}x_k\) and \(q_2:=w\)-\(\lim _{k\rightarrow \infty , k\in N_2}x_k\) in X, where \(N_1\) and \(N_2\) are two infinite subsets of \(\mathbb {N}\). By our assumption, \(q_1,q_2\in U\), and hence, since b satisfies the limiting difference property, we have
and
Since we assume that \(L_1:=\lim _{k\rightarrow \infty }B(q_1,x_k)\) and \(L_2:=\lim _{k\rightarrow \infty }B(q_2,x_k)\) exist and are finite, we conclude from (40) that \(B(q_2,q_1)=L_2-L_1=-(L_1-L_2)=-B(q_1,q_2)\). The assumptions on b imply that B is nonnegative (see, for example, [22, Proposition 4.13(III)]), and hence \(0\le B(q_2,q_1)=-B(q_1,q_2)\le 0\). Thus, \(B(q_1,q_2)=B(q_2,q_1)=0\). Since b is strictly convex on U, for all \(z_1\in \text {dom}(b)\) and \(z_2\in U\), we have \(B(z_1,z_2)=0\) if and only if \(z_1=z_2\) (see, for example, [22, Proposition 4.13(III)]). Hence \(q_1=q_2\), a contradiction to the initial assumption. Thus, all the weak cluster points of \((x_k)_{k=1}^{\infty }\) coincide.
We claim that \((x_k)_{k=1}^{\infty }\) converges weakly to the unique cluster point q. Indeed, otherwise there are a weak neighborhood V of q and a subsequence \((x_{k_j})_{j=1}^\infty \) of \((x_k)_{k=1}^\infty \) which is located outside V. But this subsequence is a bounded sequence in a reflexive Banach space (since \((x_k)_{k=1}^{\infty }\) is bounded), and hence, it has a subsequence which converges weakly, as proved above, to q. Thus, infinitely many elements of \((x_{k_j})_{j=1}^{\infty }\) are in V, a contradiction. \(\square \)
Rights and permissions
About this article
Cite this article
Reem, D., Reich, S. & De Pierro, A. A Telescopic Bregmanian Proximal Gradient Method Without the Global Lipschitz Continuity Assumption. J Optim Theory Appl 182, 851–884 (2019). https://doi.org/10.1007/s10957-019-01509-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10957-019-01509-8
Keywords
- Bregman divergence
- Lipschitz continuity
- Minimization
- TEPROG
- Telescopic proximal gradient method
- Strongly convex