Skip to main content
Log in

Bregman Proximal Mappings and Bregman–Moreau Envelopes Under Relative Prox-Regularity

  • Published:
Journal of Optimization Theory and Applications Aims and scope Submit manuscript

Abstract

We systematically study the local single-valuedness of the Bregman proximal mapping and local smoothness of the Bregman–Moreau envelope of a nonconvex function under relative prox-regularity—an extension of prox-regularity—which was originally introduced by Poliquin and Rockafellar. As Bregman distances are asymmetric in general, in accordance with Bauschke et al., it is natural to consider two variants of the Bregman proximal mapping, which, depending on the order of the arguments, are called left and right Bregman proximal mapping. We consider the left Bregman proximal mapping first. Then, via translation result, we obtain analogue (and partially sharp) results for the right Bregman proximal mapping. The class of relatively prox-regular functions significantly extends the recently considered class of relatively hypoconvex functions. In particular, relative prox-regularity allows for functions with a possibly nonconvex domain. Moreover, as a main source of examples and analogously to the classical setting, we introduce relatively amenable functions, i.e. convexly composite functions, for which the inner nonlinear mapping is component-wise smooth adaptable, a recently introduced extension of Lipschitz differentiability. By way of example, we apply our theory to locally interpret joint alternating Bregman minimization with proximal regularization as a Bregman proximal gradient algorithm, applied to a smooth adaptable function.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

Notes

  1. We write “nonconvex Bregman projection” or “nonconvex Bregman proximal mapping” for the sake of convenience, whereby they mean Bregman projection with respect to a (possibly) nonconvex set or Bregman proximal mapping with respect to a (possibly) nonconvex function.

References

  1. Moreau, J.J.: Proximité et dualité dans un espace Hilbertien. Bulletin de la S. M. F. 93, 273–299 (1965)

    MATH  Google Scholar 

  2. Attouch, H.: Convergence de fonctions convexes, des sous-différentiels et semi-groupes associés. Comptes Rendus de l’Académie des Sciences de Paris 285, 539–542 (1977)

    MATH  Google Scholar 

  3. Attouch, H.: Variational Convergence for Functions and Operators. Pitman Advanced Publishing Program, Boston (1984)

    MATH  Google Scholar 

  4. Poliquin, R.A., Rockafellar, R.T.: Prox-regular functions in variational analysis. Trans. Am. Math. Soc. 348, 1805–1838 (1996)

    MathSciNet  MATH  Google Scholar 

  5. Poliquin, R.A.: Integration of subdifferentials of nonconvex functions. Nonlinear Anal. Theory Methods Appl. 17(4), 385–398 (1991)

    MathSciNet  MATH  Google Scholar 

  6. Rockafellar, R.T., Wets, R.J.B.: Variational Analysis. Springer, New York (1998)

    MATH  Google Scholar 

  7. Bac̆ák, M., Borwein, J.M., Eberhard, A., Mordukhovich, B.: Infimal convolutions and Lipschitzian properties of subdifferentials for prox-regular functions in Hilbert spaces. J. Convex Anal. 17, 732–763 (2010)

    MathSciNet  MATH  Google Scholar 

  8. Jourani, A., Thibault, L., Zagrodny, D.: Differential properties of the Moreau envelope. J. Funct. Anal. 266(3), 1185–1237 (2014)

    MathSciNet  MATH  Google Scholar 

  9. Poliquin, R.A., Rockafellar, R.T., Thibault, L.: Local differentiability of distance functions. Trans. Am. Mathe. Soc. 352(11), 5231–5249 (2000)

    MathSciNet  MATH  Google Scholar 

  10. Bregman, L.M.: The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming. USSR Comput. Math. Math. Phys. 7, 200–217 (1967)

    MathSciNet  MATH  Google Scholar 

  11. Bauschke, H.H., Bolte, J., Teboulle, M.: A descent lemma beyond Lipschitz gradient continuity: first-order methods revisited and applications. Math. Oper. Res. 42(2), 330–348 (2017)

    MathSciNet  MATH  Google Scholar 

  12. Bolte, J., Sabach, S., Teboulle, M., Vaisbourd, Y.: First order methods beyond convexity and Lipschitz gradient continuity with applications to quadratic inverse problems. SIAM J. Optim. 28(3), 2131–2151 (2018)

    MathSciNet  MATH  Google Scholar 

  13. Teboulle, M.: Entropic proximal mappings with applications to nonlinear programming. Math. Oper. Res. 17(3), 670–690 (1992)

    MathSciNet  MATH  Google Scholar 

  14. Bauschke, H.H., Bolte, J., Chen, J., Teboulle, M., Wang, X.: On linear convergence of non-Euclidean gradient methods without strong convexity and Lipschitz gradient continuity. J. Optim. Theory Appl. 182(3), 1068–1087 (2019). https://doi.org/10.1007/s10957-019-01516-9

    Article  MathSciNet  MATH  Google Scholar 

  15. Mukkamala, M.C., Ochs, P., Pock, T., Sabach, S.: Convex-concave backtracking for inertial Bregman proximal gradient algorithms in non-convex optimization. arXiv:1904.03537 (2019)

  16. Bauschke, H.H., Combettes, P.L., Noll, D.: Joint minimization with alternating Bregman proximity operators. Pac. J. Optim. 2(3), 401–424 (2006)

    MathSciNet  MATH  Google Scholar 

  17. Bauschke, H.H., Borwein, J.M., Combettes, P.L.: Bregman monotone optimization algorithms. SIAM J. Control Optim. 42(2), 596–636 (2003)

    MathSciNet  MATH  Google Scholar 

  18. Bauschke, H.H., Combettes, P.L.: Iterating Bregman retractions. SIAM J. Optim. 13(4), 1159–1173 (2003)

    MathSciNet  MATH  Google Scholar 

  19. Eckstein, J.: Nonlinear proximal point algorithms using Bregman functions, with applications to convex programming. Math. Oper. Res. 18(1), 202–226 (1993)

    MathSciNet  MATH  Google Scholar 

  20. Ochs, P., Fadili, J., Brox, T.: Non-smooth non-convex Bregman minimization: unification and new algorithms. J. Optim. Theory Appl. 181(1), 244–278 (2018)

    MathSciNet  MATH  Google Scholar 

  21. Byrne, C., Censor, Y.: Proximity function minimization using multiple Bregman projections, with applications to split feasibility and Kullback–Leibler distance minimization. Ann. Oper. Res. 105(1), 77–98 (2001)

    MathSciNet  MATH  Google Scholar 

  22. Censor, Y., Reich, S.: The Dykstra algorithm with Bregman projections. Commun. Appl. Anal. 2, 407–419 (1998)

    MathSciNet  MATH  Google Scholar 

  23. Censor, Y., Herman, G.: Block-iterative algorithms with underrelaxed Bregman projections. SIAM J. Optim. 13(1), 283–297 (2002)

    MathSciNet  MATH  Google Scholar 

  24. Censor, Y., Zenios, S.A.: Parallel Optimization: Theory. Algorithms and Applications. Oxford University Press Inc., New York (1997)

    MATH  Google Scholar 

  25. Kassay, G., Reich, S., Sabach, S.: Iterative methods for solving systems of variational inequalities in reflexive Banach spaces. SIAM J. Optim. 21(4), 1319–1344 (2011)

    MathSciNet  MATH  Google Scholar 

  26. Kiwiel, K.: Proximal minimization methods with generalized Bregman functions. SIAM J. Control Optim. 35(4), 1142–1168 (1997)

    MathSciNet  MATH  Google Scholar 

  27. Nguyen, Q.: Variable quasi-Bregman monotone sequences. Numer. Algorithms 73(4), 1107–1130 (2016)

    MathSciNet  MATH  Google Scholar 

  28. Davis, D., Drusvyatskiy, D., MacPhee, K.J.: Stochastic model-based minimization under high-order growth. arXiv:1807.00255 (2018)

  29. Reem, D., Reich, S., De Pierro, A.: A telescopic Bregmanian proximal gradient method without the global Lipschitz continuity assumption. J. Optim. Theor. Appl. 182(3), 851–884 (2019). https://doi.org/10.1007/s10957-019-01509-8

    Article  MathSciNet  MATH  Google Scholar 

  30. Hanzely, F., Richtarik, P., Xiao, L.: Accelerated Bregman proximal gradient methods for relatively smooth convex optimization. arXiv:1808.03045 (2018)

  31. Lu, H., Freund, R., Nesterov, Y.: Relatively smooth convex optimization by first-order methods and applications. SIAM J. Optim. 28(1), 333–354 (2018)

    MathSciNet  MATH  Google Scholar 

  32. Burachik, R., Kassay, G.: On a generalized proximal point method for solving equilibrium problems in Banach spaces. Nonlinear Anal. Theory Methods Appl. 75(18), 6456–6464 (2012)

    MathSciNet  MATH  Google Scholar 

  33. Mukkamala, M.C., Ochs, P.: Beyond alternating updates for matrix factorization with inertial Bregman proximal gradient algorithms. In: Advances in Neural Information Processing Systems 32, pp. 4268–4278. Curran Associates, Inc. (2019)

  34. Nguyen, Q.: Forward–backward splitting with Bregman distances. Vietnam J. Math. 45, 1–21 (2017)

    MathSciNet  MATH  Google Scholar 

  35. Benning, M., Betcke, M.M., Ehrhardt, M.J., Schönlieb, C.B.: Choose your path wisely: gradient descent in a Bregman distance framework. arXiv:1712.04045 (2017)

  36. Censor, Y., Zenios, S.: Proximal minimization algorithm with D-functions. J. Optim. Theory Appl. 73(3), 451–464 (1992)

    MathSciNet  MATH  Google Scholar 

  37. Bauschke, H.H., Dao, M., Lindstrom, S.: Regularizing with Bregman–Moreau envelopes. SIAM J. Optim. 28(4), 3208–3228 (2018)

    MathSciNet  MATH  Google Scholar 

  38. Nemirovsky, A.S., Yudin, D.B.: Problem Complexity and Method Efficiency in Optimization. Wiley, Chichester (1983)

    Google Scholar 

  39. Chen, Y.Y., Kan, C., Song, W.: The Moreau envelope function and proximal mapping with respect to the Bregman distance in Banach spaces. Vietnam J. Math. 40(2&3), 181–199 (2012)

    MathSciNet  MATH  Google Scholar 

  40. Kan, C., Song, W.: The Moreau envelope function and proximal mapping in the sense of the Bregman distance. Nonlinear Anal. Theory Methods Appl. 75(3), 1385–1399 (2012)

    MathSciNet  MATH  Google Scholar 

  41. Bauschke, H.H., Wang, X., Ye, J., Yuan, X.: Bregman distances and Chebyshev sets. J. Approx. Theory 159(1), 3–25 (2009)

    MathSciNet  MATH  Google Scholar 

  42. Wang, X.: On Chebyshev functions and Klee functions. J. Math. Anal. Appl. 368(1), 293–310 (2010)

    MathSciNet  MATH  Google Scholar 

  43. Laude, E., Wu, T., Cremers, D.: A nonconvex proximal splitting algorithm under Moreau–Yosida regularization. In: Proceedings of the 21st International Conference on Artificial Intelligence and Statistics, vol. 84, pp. 491–499. PMLR (2018)

  44. Laude, E., Wu, T., Cremers, D.: Optimization of inf-convolution regularized nonconvex composite problems. In: Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics, vol. 89, pp. 547–556. PMLR (2019)

  45. Rockafellar, R.T.: Convex Analysis. Princeton University Press, Princeton (1970)

    MATH  Google Scholar 

  46. Bauschke, H.H., Borwein, J.: Legendre functions and the method of random Bregman projections. J. Convex Anal. 4(1), 27–67 (1997)

    MathSciNet  MATH  Google Scholar 

  47. Bauschke, H.H., Lewis, A.S.: Dykstra’s algorithm with Bregman projections: a convergence proof. Optimization 48(4), 409–427 (2000)

    MathSciNet  MATH  Google Scholar 

  48. Bauschke, H.H., Borwein, J.M., Combettes, P.L.: Essential smoothness, essential strict convexity, and Legendre functions in Banach spaces. Commun. Contemp. Math. 3(04), 615–647 (2001)

    MathSciNet  MATH  Google Scholar 

  49. Bauschke, H.H., Macklem, M.S., Wang, X.: Chebyshev sets, Klee sets, and Chebyshev centers with respect to Bregman distances: recent results and open problems. In: Fixed-Point Algorithms for Inverse Problems in Science and Engineering, pp. 1–21. Springer, New York (2011)

    MATH  Google Scholar 

  50. Harville, D.A.: Matrix Algebra: Exercises and Solutions. Springer, Berlin (2001)

    MATH  Google Scholar 

  51. Attouch, H., Bolte, J., Redont, P., Soubeyran, A.: Proximal alternating minimization and projection methods for nonconvex problems: an approach based on the Kurdyka-Łojasiewicz inequality. Math. Oper. Res. 35(2), 438–457 (2010)

    MathSciNet  MATH  Google Scholar 

  52. Bolte, J., Sabach, S., Teboulle, M.: Proximal alternating linearized minimization for nonconvex and nonsmooth problems. Math. Program. 146(1–2), 459–494 (2014)

    MathSciNet  MATH  Google Scholar 

  53. Lewis, A.S., Luke, D.R., Malick, J.: Local linear convergence for alternating and averaged nonconvex projections. Found. Comput. Math. 9(4), 485–513 (2009)

    MathSciNet  MATH  Google Scholar 

  54. Ochs, P.: Local convergence of the heavy-ball method and iPiano for non-convex optimization. J. Optim. Theory Appl. 177, 153–180 (2018)

    MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

We would like to thank Tao Wu for fruitful discussions and helpful comments.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Emanuel Laude.

Additional information

Communicated by Nicolas Hadjisavvas.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix: Proof of First Part of Theorem 4.1

Appendix: Proof of First Part of Theorem 4.1

Lemma A.1

Let the assumptions in Theorem 4.1 hold. Then we have, for the iterates produced by Algorithm 1, that

  1. (i)

    A monotonic sufficient decrease over the iterates is guaranteed:

    $$\begin{aligned} F_\lambda (u^{t+1}, x^{t+1}) + D_{\sigma }(x^{t+1}, x^t) + D_{\omega }(u^{t+1}, u^t) \le F_\lambda (u^{t}, x^{t}), \end{aligned}$$
    (36)
  2. (ii)

    \(\{u^t, x^t\}_{t \in \mathbb {N}}\) is bounded and \(x^t \in {{\,\mathrm{int}\,}}(\mathrm{dom}\,\phi )\) for all t.

  3. (iii)

    We have that \(-\,\infty < \beta \le F_\lambda (u^t, x^t)\) is uniformly bounded from below for all t and \(\{F_\lambda (u^{t+1}, x^{t+1})\}_{t\in \mathbb {N}}\) converges.

Proof

In view of the coercivity of \(F_\lambda \) and since fg are proper lsc, the iterates are well-defined.

For part (i) note that by the definition of the \(x\)-update we have that

$$\begin{aligned} F_\lambda (u^t, x^{t+1}) + D_{\sigma }(x^{t+1}, x^t) \le F_\lambda (u^t, x^t) \end{aligned}$$

and by the definition of the \(u\)-update

$$\begin{aligned} F_\lambda (u^{t+1}, x^{t+1}) + D_{\omega }(u^{t+1}, u^t) \le F_\lambda (u^t, x^{t+1}). \end{aligned}$$

Summing the two yields (36).

For part (ii) note that the boundedness of \(\{u^t, x^t\}_{t \in \mathbb {N}}\) follows from (36) and the coercivity of \(F_\lambda \). By the qualification condition and an argument similar to the one in the Proof of Lemma 3.3, we have that \(x^t \in {{\,\mathrm{int}\,}}(\mathrm{dom}\,\phi )\).

For part (iii) note that \(F_\lambda \) is proper and lsc and the iterates are bounded due to part (ii). In view of [6, Corollary 1.10], \(F_\lambda \) is bounded from below over the iterates and the conclusion follows. \(\square \)

We are now ready to prove the statement from Theorem 4.1:

Proof

We sum the estimate (36) form \(t=0\) to T and obtain, in view of Lemma A.1(iii), that

$$\begin{aligned} -\infty < F_\lambda (u^T, x^T) - F_\lambda (u^0, x^0)&= \sum _{t=0}^T F_\lambda (u^{t+1}, x^{t+1}) - F_\lambda (u^{t}, x^{t}) \\&\le -\sum _{t=0}^T \big (D_{\sigma }(x^{t+1}, x^t) +D_{\omega }(u^{t+1}, u^t)\big ). \end{aligned}$$

We take \(T \rightarrow \infty \) and deduce that

$$\begin{aligned} D_{\sigma }(x^{t+1}, x^t) +D_{\omega }(u^{t+1}, u^t) \rightarrow 0, \end{aligned}$$

and therefore \(D_{\sigma }(x^{t+1}, x^t) \rightarrow 0\) and \(D_{\omega }(u^{t+1}, u^t) \rightarrow 0\) and in view of the strict convexity of \(\sigma ,\omega \) on \({{\,\mathrm{int}\,}}(\mathrm{dom}\,\phi )\), we also have \(\Vert x^{t+1}-x^t\Vert \rightarrow 0\) and \(\Vert u^{t+1}-u^t\Vert \rightarrow 0\). In view of the \(x\)- and \(u\)-updates and the qualification condition (5), we obtain that:

$$\begin{aligned}&0 \in \partial f(x^{t+1}) + \frac{1}{\lambda } (\nabla \phi (x^{t+1}) - A(u^{t+1})) \\&\qquad +\, \nabla \sigma (x^{t+1}) - \nabla \sigma (x^t) + \frac{1}{\lambda }(A(u^{t+1}) - A(u^t)), \end{aligned}$$

and

$$\begin{aligned} 0 \in \partial g(u^{t+1}) + \frac{1}{\lambda } A^* (\nabla \phi ^*(A(u^{t+1})) - x^{t+1}) + \nabla \omega (u^{t+1}) - \nabla \omega (u^t). \end{aligned}$$

In view of [6, Exercise 8.8(c)] and [6, Proposition 10.5] and since \(x^{t+1} \in {{\,\mathrm{int}\,}}(\mathrm{dom}\,\phi )\), this means

$$\begin{aligned} \begin{pmatrix} \nabla \sigma (x^t) - \nabla \sigma (x^{t+1}) + \frac{1}{\lambda }(A(u^t) - A(u^{t+1})) \\ \nabla \omega (u^t) -\nabla \omega (u^{t+1}) \end{pmatrix} \in \partial F_\lambda (u^{t+1}, x^{t+1}). \end{aligned}$$

In view of Lemma A.1(ii), the iterates are bounded and we may consider a convergent subsequence \(\{u^{t_j}, x^{t_j}\}_{j \in \mathbb {N}} \subset \{u^t, x^t\}_{t \in \mathbb {N}}\). Let \((u^*, x^*)\) denote the limit point. In view of the closedness of \({{\,\mathrm{gph}\,}}\partial F_\lambda \) under the \(F_\lambda \)-attentive topology, we have for \(j \rightarrow \infty \), since \(F_\lambda (u^{t_j}, x^{t_j}) \rightarrow F_\lambda (u^*, x^*)\), the continuity of \(\nabla \sigma ,\nabla \omega ,A\) and \(\Vert x^{t+1}-x^t\Vert \rightarrow 0\) and \(\Vert u^{t+1}-u^t\Vert \rightarrow 0\) that:

$$\begin{aligned} 0\in \partial F_\lambda (u^*, x^*). \end{aligned}$$

It remains to argue that also the limit point \(x^* \in {{\,\mathrm{int}\,}}(\mathrm{dom}\,\phi )\) is contained in the interior of \(\mathrm{dom}\,\phi \): In view of the qualification condition (5) and an argument similar to the one in the Proof of Lemma 3.3, as well as [6, Proposition 10.5], we obtain that \(x^* \in {{\,\mathrm{int}\,}}(\mathrm{dom}\,\phi )\) and conclude that the optimality conditions (32) and (33) hold. \(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Laude, E., Ochs, P. & Cremers, D. Bregman Proximal Mappings and Bregman–Moreau Envelopes Under Relative Prox-Regularity. J Optim Theory Appl 184, 724–761 (2020). https://doi.org/10.1007/s10957-019-01628-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10957-019-01628-2

Keywords

Mathematics Subject Classification

Navigation