Skip to main content

Accelerated Bregman proximal gradient methods for relatively smooth convex optimization

Abstract

We consider the problem of minimizing the sum of two convex functions: one is differentiable and relatively smooth with respect to a reference convex function, and the other can be nondifferentiable but simple to optimize. We investigate a triangle scaling property of the Bregman distance generated by the reference convex function and present accelerated Bregman proximal gradient (ABPG) methods that attain an \(O(k^{-\gamma })\) convergence rate, where \(\gamma \in (0,2]\) is the triangle scaling exponent (TSE) of the Bregman distance. For the Euclidean distance, we have \(\gamma =2\) and recover the convergence rate of Nesterov’s accelerated gradient methods. For non-Euclidean Bregman distances, the TSE can be much smaller (say \(\gamma \le 1\)), but we show that a relaxed definition of intrinsic TSE is always equal to 2. We exploit the intrinsic TSE to develop adaptive ABPG methods that converge much faster in practice. Although theoretical guarantees on a fast convergence rate seem to be out of reach in general, our methods obtain empirical \(O(k^{-2})\) rates in numerical experiments on several applications and provide posterior numerical certificates for the fast rates.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

References

  1. Atwood, C.L.: Optimal and efficient designs of experiments. Ann. Math. Stat. 40(5), 1570–1602 (1969)

    MathSciNet  MATH  Article  Google Scholar 

  2. Auslender, A., Teboulle, M.: Interior gradient and proximal methods for convex and conic optimization. SIAM J. Optim. 16(3), 697–725 (2006)

    MathSciNet  MATH  Article  Google Scholar 

  3. Bauschke, H.H., Bolte, J., Teboulle, M.: A descent Lemma beyond Lipschitz gradient continuity: first-order method revisited and applications. Math. Oper. Res. 42(2), 330–348 (2017)

    MathSciNet  MATH  Article  Google Scholar 

  4. Bauschke, H.H., Borwein, J.M.: Joint and separate convexity of the Bregman distance. In: Butnariu, D., Censor, Y., Reich, S. (eds.) Inherently Parallel Algorithms in Feasibility and Optimization and Their Applications (Haifa 2000), pp. 23–26. Elsevier, Amsterdam (2001)

    Google Scholar 

  5. Beck, A.: First-Order Methods in Optimization. MOS-SIAM Series on Optimization. SIAM, Philadelphia (2017)

    Book  Google Scholar 

  6. Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2(1), 183–202 (2009)

    MathSciNet  MATH  Article  Google Scholar 

  7. Bertero, M., Boccacci, P., Desiderá, G., Vicidomini, G.: Image deblurring with Poisson data: from cells to galaxies. Inverse Probl. 25(12), 123006 (2009)

    MathSciNet  MATH  Article  Google Scholar 

  8. Birnbaum, B., Devanur, N. R., Xiao, L.: Distributed algorithms via gradient descent for Fisher markets. In: Proceedings of the 12th ACM conference on Electronic Commerce, pp. 127–136, San Jose, California, USA (2011)

  9. Bregman, L.M.: The relaxation method of finding the common points of convex sets and its application to the solution of problems in convex programming. USSR Comput. Math. Math. Phys. 7, 200–217 (1967)

    MathSciNet  Article  Google Scholar 

  10. Censor, Y., Lent, A.: An iterative row-action method for interval convex programming. J. Optim. Theory Appl. 34(3), 321–353 (1981)

    MathSciNet  MATH  Article  Google Scholar 

  11. Censor, Y., Zenios, S.A.: Proximal minimization algorithm with d-functions. J. Optim. Theory Appl. 73(3), 451–464 (1992)

    MathSciNet  MATH  Article  Google Scholar 

  12. Chang, C.-C., Lin, C.-J.: Libsvm: a library for support vector machines. ACM Trans. Intell. Syst. Technol. 2(3), 27 (2011)

    Article  Google Scholar 

  13. Chen, G., Teboulle, M.: Convergence analysis of a proximal-like minimization algorithm using Bregman functions. SIAM J. Optim. 3(3), 538–543 (1993)

    MathSciNet  MATH  Article  Google Scholar 

  14. Csiszár, I.: Why least squares and maximum entropy? An axiomatic approach to inference for linear inverse problems. Ann. Stat. 19(4), 2032–2066 (1991)

    MathSciNet  MATH  Article  Google Scholar 

  15. Dragomir, R.-A., Taylor, A.B., d’Aspremont, A., Bolte, J.: Optimal complexity and certification of bregman first-order methods. Preprint, arXiv:1911.08510 (2019)

  16. Gutman, D.H., Peña, J.F.: Perturbed Fenchel duality and first-order methods. Preprint, arXiv:1812.10198 (2018)

  17. Hanzely, F., Richtárik, P.: Fastest rates for stochastic mirror descent methods. arXiv preprint arXiv:1803.07374 (2018)

  18. Hardy, G., Littlewood, J.E., Pólya, G.: Inequalities, 2nd edn. Cambridge University Press, Cambridge (1952)

    MATH  Google Scholar 

  19. Kiefer, J., Wolfowitz, J.: Optimal design in regression problems. Ann. Math. Stat. 30(2), 271–294 (1959)

    MATH  Article  Google Scholar 

  20. Lu, H.: Relative-continuity for non-Lipschitz non-smooth convex optimization using stochastic (or deterministic) mirror descent. INFORMS Journal on Optimization 1(4), 288–303 (2019)

    MathSciNet  Article  Google Scholar 

  21. Lu, H., Freund, R.M., Nesterov, Y.: Relatively smooth convex optimization by first-order methods, and applications. SIAM J. Optim. 28(1), 333–354 (2018)

    MathSciNet  MATH  Article  Google Scholar 

  22. Nemirovsky, A., Yudin, D.: Problem Complexity and Method Efficiency in Optimization. Wiley, New York (1983)

    Google Scholar 

  23. Nesterov, Y.: A method for solving a convex programming problem with convergence rate O(1/k2). Sov. Math. Doklady 27(2), 372–376 (1983)

    MATH  Google Scholar 

  24. Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Èkonom. i. Mat. Metody 24, 509–517 (1988)

    MathSciNet  MATH  Google Scholar 

  25. Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course. Kluwer, Boston (2004)

    MATH  Book  Google Scholar 

  26. Nesterov, Y.: Smooth minimization of nonsmooth functions. Math. Program. 103, 127–152 (2005)

    MathSciNet  MATH  Article  Google Scholar 

  27. Nesterov, Y.: Gradient methods for minimizing composite functions. Math. Program. Ser. B 140, 125–161 (2013)

    MathSciNet  MATH  Article  Google Scholar 

  28. Nesterov, Y.: Universal gradient methods for convex optimization problems. Math. Program. Ser. A 152, 381–404 (2015)

    MathSciNet  MATH  Article  Google Scholar 

  29. Nesterov, Y.: Implementable tensor methods in unconstrained convex optimization. Math. Program. 186, 157–183 (2021)

    MathSciNet  MATH  Article  Google Scholar 

  30. O’Donoghue, B., Candès, E.: Adaptive restart for accelerated gradient schemes. Found. Comput. Math. 15(3), 715–732 (2015)

    MathSciNet  MATH  Article  Google Scholar 

  31. Rockafellar, R.T.: Convex Analysis. Princeton University Press, Princeton (1970)

    MATH  Book  Google Scholar 

  32. Teboulle, M.: A simplified view of first order methods for optimization. Math. Program. Ser. B 170, 67–96 (2018)

    MathSciNet  MATH  Article  Google Scholar 

  33. Tseng, P.: On accelerated proximal gradient methods for convex-concave optimization (unpublished manuscript) (2008)

  34. Zhou, Y., Liang, Y., Shen, L.: A simple convergence analysis of Bregman proximal gradient algorithm. Comput. Optim. Appl. 93, 903–912 (2019)

    MathSciNet  MATH  Article  Google Scholar 

Download references

Acknowledgements

We thank Haihao Lu, Robert Freund and Yurii Nesterov for helpful conversations. We are also grateful to the anonymous referees whose comments helped improving the clarity of the paper. Peter Richtárik acknowledges the support of the KAUST Baseline Research Funding Scheme.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lin Xiao.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Hanzely, F., Richtárik, P. & Xiao, L. Accelerated Bregman proximal gradient methods for relatively smooth convex optimization. Comput Optim Appl 79, 405–440 (2021). https://doi.org/10.1007/s10589-021-00273-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10589-021-00273-8

Keywords

  • Convex optimization
  • Relative smoothness
  • Bregman divergence
  • Proximal gradient methods
  • Accelerated gradient methods