Skip to main content
Log in

Non-smooth Non-convex Bregman Minimization: Unification and New Algorithms

  • Published:
Journal of Optimization Theory and Applications Aims and scope Submit manuscript

Abstract

We propose a unifying algorithm for non-smooth non-convex optimization. The algorithm approximates the objective function by a convex model function and finds an approximate (Bregman) proximal point of the convex model. This approximate minimizer of the model function yields a descent direction, along which the next iterate is found. Complemented with an Armijo-like line search strategy, we obtain a flexible algorithm for which we prove (subsequential) convergence to a stationary point under weak assumptions on the growth of the model function error. Special instances of the algorithm with a Euclidean distance function are, for example, gradient descent, forward–backward splitting, ProxDescent, without the common requirement of a “Lipschitz continuous gradient”. In addition, we consider a broad class of Bregman distance functions (generated by Legendre functions), replacing the Euclidean distance. The algorithm has a wide range of applications including many linear and nonlinear inverse problems in signal/image processing and machine learning.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2

Similar content being viewed by others

Notes

  1. It is often easy to find a feasible point. Of course, there are cases, where finding an initialization is a problem itself. We assume that the user provides a feasible initial point.

  2. Note that \(\inf _{k}\eta _{{k}}>0\) is equivalent to \(\liminf _{k}\eta _{{k}}>0\), as we assume \(\eta _{{k}}>0\) for all \({k}\in \mathbb {N}\).

  3. The example is not meant to be meaningful and the model function to be algorithmically the best choice. This example shall demonstrate the flexibility and problem adaptivity of our framework.

  4. For very specific instances, a recent line of research proposes to lift the problem to the space of low-rank matrices, and then use convex relaxation and computationally intensive conic programming that are only applicable to small-dimensional problems; see, e.g., [35] for blind deconvolution.

  5. Strictly speaking, \({\mathcal {Z}}_3\) should be the nonnegative orthant for sparse NMF. But this does not change anything to our discussion since computing the Euclidean proximal mapping of the \(\ell _1\) norm restricted to the nonnegative orthant is easy.

References

  1. Drusvyatskiy, D., Ioffe, A.D., Lewis, A.S.: Nonsmooth optimization using Taylor-like models: error bounds, convergence, and termination criteria. ArXiv e-prints (2016). ArXiv:1610.03446

  2. Lions, P.L., Mercier, B.: Splitting algorithms for the sum of two nonlinear operators. SIAM J. Appl. Math. 16(6), 964–979 (1979)

    MathSciNet  MATH  Google Scholar 

  3. Lewis, A., Wright, S.: A proximal method for composite minimization. Math. Program. 158(1–2), 501–546 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  4. Drusvyatskiy, D., Lewis, A.S.: Error bounds, quadratic growth, and linear convergence of proximal methods. ArXiv e-prints (2016). ArXiv:1602.06661

  5. Noll, D., Prot, O., Apkarian, P.: A proximity control algorithm to minimize nonsmooth and nonconvex functions. Pac. J. Optim. 4(3), 571–604 (2008)

    MathSciNet  MATH  Google Scholar 

  6. Noll, D.: Convergence of non-smooth descent methods using the Kurdyka–Łojasiewicz inequality. J. Optim. Theory Appl. 160(2), 553–572 (2013)

    Article  MATH  Google Scholar 

  7. Bonettini, S., Loris, I., Porta, F., Prato, M.: Variable metric inexact line-search based methods for nonsmooth optimization. SIAM J. Optim. 26(2), 891–921 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  8. Burg, J.: The relationship between maximum entropy spectra and maximum likelihood spectra. Geophysics 37(2), 375–376 (1972)

    Article  Google Scholar 

  9. Bauschke, H., Borwein, J.: Legendre functions and the method of random Bregman projections. J. Convex Anal. 4(1), 27–67 (1997)

    MathSciNet  MATH  Google Scholar 

  10. Bregman, L.M.: The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming. USSR Comput. Math. Math. Phys. 7(3), 200–217 (1967)

    Article  MathSciNet  MATH  Google Scholar 

  11. Bauschke, H., Borwein, J., Combettes, P.: Essential smoothness, essential strict convexity, and Legendre functions in Banach spaces. Commun. Contemp. Math. 3(4), 615–647 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  12. Chen, G., Teboulle, M.: Convergence analysis of proximal-like minimization algorithm using bregman functions. SIAM J. Optim. 3, 538–543 (1993)

    Article  MathSciNet  MATH  Google Scholar 

  13. Bauschke, H., Borwein, J., Combettes, P.: Bregman monotone optimization algorithms. SIAM J. Control Optim. 42(2), 596–636 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  14. Bauschke, H.H., Bolte, J., Teboulle, M.: A descent lemma beyond Lipschitz gradient continuity: first-order methods revisited and applications. Math. Oper. Res. 42(2), 330–348 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  15. Nguyen, Q.: Forward-backward splitting with Bregman distances. Vietnam J. Math. 45(3), 519–539 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  16. Attouch, H., Bolte, J., Svaiter, B.: Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward–backward splitting, and regularized Gauss–Seidel methods. Math. Program. 137(1–2), 91–129 (2013). https://doi.org/10.1007/s10107-011-0484-9

    Article  MathSciNet  MATH  Google Scholar 

  17. Bolte, J., Sabach, S., Teboulle, M.: Proximal alternating linearized minimization for nonconvex and nonsmooth problems. Math. Program. 146(1–2), 459–494 (2014). https://doi.org/10.1007/s10107-013-0701-9

    Article  MathSciNet  MATH  Google Scholar 

  18. Marquardt, D.: An algorithm for least-squares estimation of nonlinear parameters. Soc. Ind. Appl. Math. 11, 431–441 (1963)

    Article  MathSciNet  MATH  Google Scholar 

  19. Bauschke, H.H., Combettes, P.L.: Convex Analysis and Monotone Operator Theory in Hilbert Spaces. Springer, New York (2011)

    Book  MATH  Google Scholar 

  20. Rockafellar, R.T., Wets, R.B.: Variational Analysis, vol. 317. Springer, Heidelberg (1998). https://doi.org/10.1007/978-3-642-02431-3

    Book  MATH  Google Scholar 

  21. Ochs, P., Dosovitskiy, A., Brox, T., Pock, T.: On iteratively reweighted algorithms for nonsmooth nonconvex optimization in computer vision. SIAM J. Imaging Sci. 8(1), 331–372 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  22. Hampel, F.R., Ronchetti, E.M., Rousseeuw, P.J., Stahel, W.A.: Robust Statistics: The Approach Based on Influence Functions. MIT Press, Cambridge (1986)

    MATH  Google Scholar 

  23. Chambolle, A.: An algorithm for total variation minimization and applications. J. Math. Imaging Vis. 20, 89–97 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  24. Combettes, P., Dũng, D., Vũ, B.: Dualization of signal recovery problems. Set-Valued Var. Anal. 18(3–4), 373–404 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  25. Bertero, M., Boccacci, P., Desiderà, G., Vicidomini, G.: Image deblurring with Poisson data: from cells to galaxies. Inverse Probl. 25(12), 123,006 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  26. Zanella, R., Boccacci, P., Zanni, L., Bertero, M.: Efficient gradient projection methods for edge-preserving removal of Poisson noise. Inverse Probl. 25(4) (2009)

  27. Vardi, Y., Shepp, L., Kaufman, L.: A statistical model for positron emission tomography. J. Am. Stat. Assoc. 80(389), 8–20 (1985)

    Article  MathSciNet  MATH  Google Scholar 

  28. Geman, S., Geman, D.: Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Trans. Pattern Anal. Mach. Intell. 6, 721–741 (1984)

    Article  MATH  Google Scholar 

  29. Blake, A., Zisserman, A.: Visual Reconstruction. MIT Press, Cambridge (1987)

    Book  Google Scholar 

  30. Mumford, D., Shah, J.: Optimal approximations by piecewise smooth functions and associated variational problems. Commun. Pure Appl. Math. 42, 577–685 (1989)

    Article  MathSciNet  MATH  Google Scholar 

  31. Cichocki, A., Zdunek, R., Phan, A., Amari, S.: Nonnegative Matrix and Tensor Factorizations: Applications to Exploratory Multi-Way Data Analysis and Blind Source Separation. Wiley, New York (2009)

    Book  Google Scholar 

  32. Chaudhuri, S., Velmurugan, R., Rameshan, R.: Blind Image Deconvolution. Springer, New York (2014)

    MATH  Google Scholar 

  33. Starck, J.L., Murtagh, F., Fadili, J.: Sparse Image and Signal Processing: Wavelets, Curvelets, Morphological Diversity, 2nd edn. Cambridge University Press, Cambridge (2015)

    Book  MATH  Google Scholar 

  34. Xu, Y., Li, Z., Yang, J., Zhang, D.: A survey of dictionary learning algorithms for face recognition. IEEE Access 5, 8502–8514 (2017). https://doi.org/10.1109/ACCESS.2017.2695239

    Article  Google Scholar 

  35. Ahmed, A., Recht, B., Romberg, J.: Blind deconvolution using convex programming. IEEE Trans. Inf. Theory 60(3), 1711–1732 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  36. Lee, D., Seung, H.: Learning the part of objects from nonnegative matrix factorization. Nature 401, 788–791 (1999)

    Article  MATH  Google Scholar 

  37. Michelot, C.: A finite algorithm for finding the projection of a point onto the canonical simplex of \(\mathbb{R}^n\). J. Optim. Theory Appl. 50, 195–200 (1986)

    Article  MathSciNet  MATH  Google Scholar 

  38. Olshausen, B., Field, D.: Sparse coding with an overcomplete basis set: a strategy employed by V1? Vis. Res. 37, 3311–3325 (1996)

    Article  Google Scholar 

  39. Hoyer, P.: Non-negative matrix factorization with sparseness constraints. J. Mach. Learn. Res. 5, 1457–1469 (2004)

    MathSciNet  MATH  Google Scholar 

  40. Recht, B., Fazel, M., Parrilo, P.A.: Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization. SIAM Rev. 52(3), 471–501 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  41. Nesterov, Y.: Introductory lectures on convex optimization: A basic course. Applied optimization, vol. 87. Kluwer Academic Publishers, Boston, MA (2004)

  42. Ochs, P., Chen, Y., Brox, T., Pock, T.: iPiano: inertial proximal algorithm for non-convex optimization. SIAM J. Imaging Sci. 7(2), 1388–1419 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  43. Liang, J., Fadili, J., Peyré, G.: A multi-step inertial forward–backward splitting method for non-convex optimization. arXiv:1606.02118 [math] (2016)

  44. Wen, B., Chen, X., Pong, T.: Linear convergence of proximal gradient algorithm with extrapolation for a class of nonconvex nonsmooth minimization problems. SIAM J. Optim. 27(1), 124–145 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  45. Drusvyatskiy, D., Kempton, C.: An accelerated algorithm for minimizing convex compositions. ArXiv e-prints (2016). ArXiv:1605.00125 [math]

  46. Kurdyka, K.: On gradients of functions definable in o-minimal structures. Annales de l’institut Fourier 48(3), 769–783 (1998)

    Article  MathSciNet  MATH  Google Scholar 

  47. Łojasiewicz, S.: Une propriété topologique des sous-ensembles analytiques réels. In: Les Équations aux Dérivées Partielles, pp. 87–89. Éditions du centre National de la Recherche Scientifique, Paris (1963)

  48. Łojasiewicz, S.: Sur la géométrie semi- et sous- analytique. Annales de l’institut Fourier 43(5), 1575–1595 (1993)

    Article  MATH  MathSciNet  Google Scholar 

  49. Bolte, J., Daniilidis, A., Lewis, A.: The Łojasiewicz inequality for nonsmooth subanalytic functions with applications to subgradient dynamical systems. SIAM J. Optim. 17(4), 1205–1223 (2006). https://doi.org/10.1137/050644641

    Article  MathSciNet  MATH  Google Scholar 

  50. Bolte, J., Daniilidis, A., Lewis, A., Shiota, M.: Clarke subgradients of stratifiable functions. SIAM J. Optim. 18(2), 556–572 (2007)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

P. Ochs acknowledges funding by the German Research Foundation (DFG Grant OC 150/1-1).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Peter Ochs.

Additional information

Communicated by Jérôme Bolte.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ochs, P., Fadili, J. & Brox, T. Non-smooth Non-convex Bregman Minimization: Unification and New Algorithms. J Optim Theory Appl 181, 244–278 (2019). https://doi.org/10.1007/s10957-018-01452-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10957-018-01452-0

Keywords

Mathematics Subject Classification

Navigation