Abstract
We systematically study the local single-valuedness of the Bregman proximal mapping and local smoothness of the Bregman–Moreau envelope of a nonconvex function under relative prox-regularity—an extension of prox-regularity—which was originally introduced by Poliquin and Rockafellar. As Bregman distances are asymmetric in general, in accordance with Bauschke et al., it is natural to consider two variants of the Bregman proximal mapping, which, depending on the order of the arguments, are called left and right Bregman proximal mapping. We consider the left Bregman proximal mapping first. Then, via translation result, we obtain analogue (and partially sharp) results for the right Bregman proximal mapping. The class of relatively prox-regular functions significantly extends the recently considered class of relatively hypoconvex functions. In particular, relative prox-regularity allows for functions with a possibly nonconvex domain. Moreover, as a main source of examples and analogously to the classical setting, we introduce relatively amenable functions, i.e. convexly composite functions, for which the inner nonlinear mapping is component-wise smooth adaptable, a recently introduced extension of Lipschitz differentiability. By way of example, we apply our theory to locally interpret joint alternating Bregman minimization with proximal regularization as a Bregman proximal gradient algorithm, applied to a smooth adaptable function.
Similar content being viewed by others
Notes
We write “nonconvex Bregman projection” or “nonconvex Bregman proximal mapping” for the sake of convenience, whereby they mean Bregman projection with respect to a (possibly) nonconvex set or Bregman proximal mapping with respect to a (possibly) nonconvex function.
References
Moreau, J.J.: Proximité et dualité dans un espace Hilbertien. Bulletin de la S. M. F. 93, 273–299 (1965)
Attouch, H.: Convergence de fonctions convexes, des sous-différentiels et semi-groupes associés. Comptes Rendus de l’Académie des Sciences de Paris 285, 539–542 (1977)
Attouch, H.: Variational Convergence for Functions and Operators. Pitman Advanced Publishing Program, Boston (1984)
Poliquin, R.A., Rockafellar, R.T.: Prox-regular functions in variational analysis. Trans. Am. Math. Soc. 348, 1805–1838 (1996)
Poliquin, R.A.: Integration of subdifferentials of nonconvex functions. Nonlinear Anal. Theory Methods Appl. 17(4), 385–398 (1991)
Rockafellar, R.T., Wets, R.J.B.: Variational Analysis. Springer, New York (1998)
Bac̆ák, M., Borwein, J.M., Eberhard, A., Mordukhovich, B.: Infimal convolutions and Lipschitzian properties of subdifferentials for prox-regular functions in Hilbert spaces. J. Convex Anal. 17, 732–763 (2010)
Jourani, A., Thibault, L., Zagrodny, D.: Differential properties of the Moreau envelope. J. Funct. Anal. 266(3), 1185–1237 (2014)
Poliquin, R.A., Rockafellar, R.T., Thibault, L.: Local differentiability of distance functions. Trans. Am. Mathe. Soc. 352(11), 5231–5249 (2000)
Bregman, L.M.: The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming. USSR Comput. Math. Math. Phys. 7, 200–217 (1967)
Bauschke, H.H., Bolte, J., Teboulle, M.: A descent lemma beyond Lipschitz gradient continuity: first-order methods revisited and applications. Math. Oper. Res. 42(2), 330–348 (2017)
Bolte, J., Sabach, S., Teboulle, M., Vaisbourd, Y.: First order methods beyond convexity and Lipschitz gradient continuity with applications to quadratic inverse problems. SIAM J. Optim. 28(3), 2131–2151 (2018)
Teboulle, M.: Entropic proximal mappings with applications to nonlinear programming. Math. Oper. Res. 17(3), 670–690 (1992)
Bauschke, H.H., Bolte, J., Chen, J., Teboulle, M., Wang, X.: On linear convergence of non-Euclidean gradient methods without strong convexity and Lipschitz gradient continuity. J. Optim. Theory Appl. 182(3), 1068–1087 (2019). https://doi.org/10.1007/s10957-019-01516-9
Mukkamala, M.C., Ochs, P., Pock, T., Sabach, S.: Convex-concave backtracking for inertial Bregman proximal gradient algorithms in non-convex optimization. arXiv:1904.03537 (2019)
Bauschke, H.H., Combettes, P.L., Noll, D.: Joint minimization with alternating Bregman proximity operators. Pac. J. Optim. 2(3), 401–424 (2006)
Bauschke, H.H., Borwein, J.M., Combettes, P.L.: Bregman monotone optimization algorithms. SIAM J. Control Optim. 42(2), 596–636 (2003)
Bauschke, H.H., Combettes, P.L.: Iterating Bregman retractions. SIAM J. Optim. 13(4), 1159–1173 (2003)
Eckstein, J.: Nonlinear proximal point algorithms using Bregman functions, with applications to convex programming. Math. Oper. Res. 18(1), 202–226 (1993)
Ochs, P., Fadili, J., Brox, T.: Non-smooth non-convex Bregman minimization: unification and new algorithms. J. Optim. Theory Appl. 181(1), 244–278 (2018)
Byrne, C., Censor, Y.: Proximity function minimization using multiple Bregman projections, with applications to split feasibility and Kullback–Leibler distance minimization. Ann. Oper. Res. 105(1), 77–98 (2001)
Censor, Y., Reich, S.: The Dykstra algorithm with Bregman projections. Commun. Appl. Anal. 2, 407–419 (1998)
Censor, Y., Herman, G.: Block-iterative algorithms with underrelaxed Bregman projections. SIAM J. Optim. 13(1), 283–297 (2002)
Censor, Y., Zenios, S.A.: Parallel Optimization: Theory. Algorithms and Applications. Oxford University Press Inc., New York (1997)
Kassay, G., Reich, S., Sabach, S.: Iterative methods for solving systems of variational inequalities in reflexive Banach spaces. SIAM J. Optim. 21(4), 1319–1344 (2011)
Kiwiel, K.: Proximal minimization methods with generalized Bregman functions. SIAM J. Control Optim. 35(4), 1142–1168 (1997)
Nguyen, Q.: Variable quasi-Bregman monotone sequences. Numer. Algorithms 73(4), 1107–1130 (2016)
Davis, D., Drusvyatskiy, D., MacPhee, K.J.: Stochastic model-based minimization under high-order growth. arXiv:1807.00255 (2018)
Reem, D., Reich, S., De Pierro, A.: A telescopic Bregmanian proximal gradient method without the global Lipschitz continuity assumption. J. Optim. Theor. Appl. 182(3), 851–884 (2019). https://doi.org/10.1007/s10957-019-01509-8
Hanzely, F., Richtarik, P., Xiao, L.: Accelerated Bregman proximal gradient methods for relatively smooth convex optimization. arXiv:1808.03045 (2018)
Lu, H., Freund, R., Nesterov, Y.: Relatively smooth convex optimization by first-order methods and applications. SIAM J. Optim. 28(1), 333–354 (2018)
Burachik, R., Kassay, G.: On a generalized proximal point method for solving equilibrium problems in Banach spaces. Nonlinear Anal. Theory Methods Appl. 75(18), 6456–6464 (2012)
Mukkamala, M.C., Ochs, P.: Beyond alternating updates for matrix factorization with inertial Bregman proximal gradient algorithms. In: Advances in Neural Information Processing Systems 32, pp. 4268–4278. Curran Associates, Inc. (2019)
Nguyen, Q.: Forward–backward splitting with Bregman distances. Vietnam J. Math. 45, 1–21 (2017)
Benning, M., Betcke, M.M., Ehrhardt, M.J., Schönlieb, C.B.: Choose your path wisely: gradient descent in a Bregman distance framework. arXiv:1712.04045 (2017)
Censor, Y., Zenios, S.: Proximal minimization algorithm with D-functions. J. Optim. Theory Appl. 73(3), 451–464 (1992)
Bauschke, H.H., Dao, M., Lindstrom, S.: Regularizing with Bregman–Moreau envelopes. SIAM J. Optim. 28(4), 3208–3228 (2018)
Nemirovsky, A.S., Yudin, D.B.: Problem Complexity and Method Efficiency in Optimization. Wiley, Chichester (1983)
Chen, Y.Y., Kan, C., Song, W.: The Moreau envelope function and proximal mapping with respect to the Bregman distance in Banach spaces. Vietnam J. Math. 40(2&3), 181–199 (2012)
Kan, C., Song, W.: The Moreau envelope function and proximal mapping in the sense of the Bregman distance. Nonlinear Anal. Theory Methods Appl. 75(3), 1385–1399 (2012)
Bauschke, H.H., Wang, X., Ye, J., Yuan, X.: Bregman distances and Chebyshev sets. J. Approx. Theory 159(1), 3–25 (2009)
Wang, X.: On Chebyshev functions and Klee functions. J. Math. Anal. Appl. 368(1), 293–310 (2010)
Laude, E., Wu, T., Cremers, D.: A nonconvex proximal splitting algorithm under Moreau–Yosida regularization. In: Proceedings of the 21st International Conference on Artificial Intelligence and Statistics, vol. 84, pp. 491–499. PMLR (2018)
Laude, E., Wu, T., Cremers, D.: Optimization of inf-convolution regularized nonconvex composite problems. In: Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics, vol. 89, pp. 547–556. PMLR (2019)
Rockafellar, R.T.: Convex Analysis. Princeton University Press, Princeton (1970)
Bauschke, H.H., Borwein, J.: Legendre functions and the method of random Bregman projections. J. Convex Anal. 4(1), 27–67 (1997)
Bauschke, H.H., Lewis, A.S.: Dykstra’s algorithm with Bregman projections: a convergence proof. Optimization 48(4), 409–427 (2000)
Bauschke, H.H., Borwein, J.M., Combettes, P.L.: Essential smoothness, essential strict convexity, and Legendre functions in Banach spaces. Commun. Contemp. Math. 3(04), 615–647 (2001)
Bauschke, H.H., Macklem, M.S., Wang, X.: Chebyshev sets, Klee sets, and Chebyshev centers with respect to Bregman distances: recent results and open problems. In: Fixed-Point Algorithms for Inverse Problems in Science and Engineering, pp. 1–21. Springer, New York (2011)
Harville, D.A.: Matrix Algebra: Exercises and Solutions. Springer, Berlin (2001)
Attouch, H., Bolte, J., Redont, P., Soubeyran, A.: Proximal alternating minimization and projection methods for nonconvex problems: an approach based on the Kurdyka-Łojasiewicz inequality. Math. Oper. Res. 35(2), 438–457 (2010)
Bolte, J., Sabach, S., Teboulle, M.: Proximal alternating linearized minimization for nonconvex and nonsmooth problems. Math. Program. 146(1–2), 459–494 (2014)
Lewis, A.S., Luke, D.R., Malick, J.: Local linear convergence for alternating and averaged nonconvex projections. Found. Comput. Math. 9(4), 485–513 (2009)
Ochs, P.: Local convergence of the heavy-ball method and iPiano for non-convex optimization. J. Optim. Theory Appl. 177, 153–180 (2018)
Acknowledgements
We would like to thank Tao Wu for fruitful discussions and helpful comments.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Nicolas Hadjisavvas.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix: Proof of First Part of Theorem 4.1
Appendix: Proof of First Part of Theorem 4.1
Lemma A.1
Let the assumptions in Theorem 4.1 hold. Then we have, for the iterates produced by Algorithm 1, that
- (i)
A monotonic sufficient decrease over the iterates is guaranteed:
$$\begin{aligned} F_\lambda (u^{t+1}, x^{t+1}) + D_{\sigma }(x^{t+1}, x^t) + D_{\omega }(u^{t+1}, u^t) \le F_\lambda (u^{t}, x^{t}), \end{aligned}$$(36) - (ii)
\(\{u^t, x^t\}_{t \in \mathbb {N}}\) is bounded and \(x^t \in {{\,\mathrm{int}\,}}(\mathrm{dom}\,\phi )\) for all t.
- (iii)
We have that \(-\,\infty < \beta \le F_\lambda (u^t, x^t)\) is uniformly bounded from below for all t and \(\{F_\lambda (u^{t+1}, x^{t+1})\}_{t\in \mathbb {N}}\) converges.
Proof
In view of the coercivity of \(F_\lambda \) and since f, g are proper lsc, the iterates are well-defined.
For part (i) note that by the definition of the \(x\)-update we have that
and by the definition of the \(u\)-update
Summing the two yields (36).
For part (ii) note that the boundedness of \(\{u^t, x^t\}_{t \in \mathbb {N}}\) follows from (36) and the coercivity of \(F_\lambda \). By the qualification condition and an argument similar to the one in the Proof of Lemma 3.3, we have that \(x^t \in {{\,\mathrm{int}\,}}(\mathrm{dom}\,\phi )\).
For part (iii) note that \(F_\lambda \) is proper and lsc and the iterates are bounded due to part (ii). In view of [6, Corollary 1.10], \(F_\lambda \) is bounded from below over the iterates and the conclusion follows. \(\square \)
We are now ready to prove the statement from Theorem 4.1:
Proof
We sum the estimate (36) form \(t=0\) to T and obtain, in view of Lemma A.1(iii), that
We take \(T \rightarrow \infty \) and deduce that
and therefore \(D_{\sigma }(x^{t+1}, x^t) \rightarrow 0\) and \(D_{\omega }(u^{t+1}, u^t) \rightarrow 0\) and in view of the strict convexity of \(\sigma ,\omega \) on \({{\,\mathrm{int}\,}}(\mathrm{dom}\,\phi )\), we also have \(\Vert x^{t+1}-x^t\Vert \rightarrow 0\) and \(\Vert u^{t+1}-u^t\Vert \rightarrow 0\). In view of the \(x\)- and \(u\)-updates and the qualification condition (5), we obtain that:
and
In view of [6, Exercise 8.8(c)] and [6, Proposition 10.5] and since \(x^{t+1} \in {{\,\mathrm{int}\,}}(\mathrm{dom}\,\phi )\), this means
In view of Lemma A.1(ii), the iterates are bounded and we may consider a convergent subsequence \(\{u^{t_j}, x^{t_j}\}_{j \in \mathbb {N}} \subset \{u^t, x^t\}_{t \in \mathbb {N}}\). Let \((u^*, x^*)\) denote the limit point. In view of the closedness of \({{\,\mathrm{gph}\,}}\partial F_\lambda \) under the \(F_\lambda \)-attentive topology, we have for \(j \rightarrow \infty \), since \(F_\lambda (u^{t_j}, x^{t_j}) \rightarrow F_\lambda (u^*, x^*)\), the continuity of \(\nabla \sigma ,\nabla \omega ,A\) and \(\Vert x^{t+1}-x^t\Vert \rightarrow 0\) and \(\Vert u^{t+1}-u^t\Vert \rightarrow 0\) that:
It remains to argue that also the limit point \(x^* \in {{\,\mathrm{int}\,}}(\mathrm{dom}\,\phi )\) is contained in the interior of \(\mathrm{dom}\,\phi \): In view of the qualification condition (5) and an argument similar to the one in the Proof of Lemma 3.3, as well as [6, Proposition 10.5], we obtain that \(x^* \in {{\,\mathrm{int}\,}}(\mathrm{dom}\,\phi )\) and conclude that the optimality conditions (32) and (33) hold. \(\square \)
Rights and permissions
About this article
Cite this article
Laude, E., Ochs, P. & Cremers, D. Bregman Proximal Mappings and Bregman–Moreau Envelopes Under Relative Prox-Regularity. J Optim Theory Appl 184, 724–761 (2020). https://doi.org/10.1007/s10957-019-01628-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10957-019-01628-2