Accelerated Bregman proximal gradient methods for relatively smooth convex optimization

Hanzely, Filip; Richtárik, Peter; Xiao, Lin

doi:10.1007/s10589-021-00273-8

Accelerated Bregman proximal gradient methods for relatively smooth convex optimization

Published: 07 April 2021

Volume 79, pages 405–440, (2021)
Cite this article

Computational Optimization and Applications Aims and scope Submit manuscript

1625 Accesses
15 Citations
Explore all metrics

Abstract

We consider the problem of minimizing the sum of two convex functions: one is differentiable and relatively smooth with respect to a reference convex function, and the other can be nondifferentiable but simple to optimize. We investigate a triangle scaling property of the Bregman distance generated by the reference convex function and present accelerated Bregman proximal gradient (ABPG) methods that attain an \(O(k^{-\gamma })\) convergence rate, where \(\gamma \in (0,2]\) is the triangle scaling exponent (TSE) of the Bregman distance. For the Euclidean distance, we have \(\gamma =2\) and recover the convergence rate of Nesterov’s accelerated gradient methods. For non-Euclidean Bregman distances, the TSE can be much smaller (say \(\gamma \le 1\)), but we show that a relaxed definition of intrinsic TSE is always equal to 2. We exploit the intrinsic TSE to develop adaptive ABPG methods that converge much faster in practice. Although theoretical guarantees on a fast convergence rate seem to be out of reach in general, our methods obtain empirical \(O(k^{-2})\) rates in numerical experiments on several applications and provide posterior numerical certificates for the fast rates.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Non-smooth Non-convex Bregman Minimization: Unification and New Algorithms

Article 06 December 2018

Generalized Nesterov’s accelerated proximal gradient algorithms with convergence rate of order o(1/k2)

Article 22 August 2022

Accelerated Gradient-Free Optimization Methods with a Non-Euclidean Proximal Operator

Article 16 August 2019

References

Atwood, C.L.: Optimal and efficient designs of experiments. Ann. Math. Stat. 40(5), 1570–1602 (1969)
Article MathSciNet MATH Google Scholar
Auslender, A., Teboulle, M.: Interior gradient and proximal methods for convex and conic optimization. SIAM J. Optim. 16(3), 697–725 (2006)
Article MathSciNet MATH Google Scholar
Bauschke, H.H., Bolte, J., Teboulle, M.: A descent Lemma beyond Lipschitz gradient continuity: first-order method revisited and applications. Math. Oper. Res. 42(2), 330–348 (2017)
Article MathSciNet MATH Google Scholar
Bauschke, H.H., Borwein, J.M.: Joint and separate convexity of the Bregman distance. In: Butnariu, D., Censor, Y., Reich, S. (eds.) Inherently Parallel Algorithms in Feasibility and Optimization and Their Applications (Haifa 2000), pp. 23–26. Elsevier, Amsterdam (2001)
Google Scholar
Beck, A.: First-Order Methods in Optimization. MOS-SIAM Series on Optimization. SIAM, Philadelphia (2017)
Book Google Scholar
Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2(1), 183–202 (2009)
Article MathSciNet MATH Google Scholar
Bertero, M., Boccacci, P., Desiderá, G., Vicidomini, G.: Image deblurring with Poisson data: from cells to galaxies. Inverse Probl. 25(12), 123006 (2009)
Article MathSciNet MATH Google Scholar
Birnbaum, B., Devanur, N. R., Xiao, L.: Distributed algorithms via gradient descent for Fisher markets. In: Proceedings of the 12th ACM conference on Electronic Commerce, pp. 127–136, San Jose, California, USA (2011)
Bregman, L.M.: The relaxation method of finding the common points of convex sets and its application to the solution of problems in convex programming. USSR Comput. Math. Math. Phys. 7, 200–217 (1967)
Article MathSciNet Google Scholar
Censor, Y., Lent, A.: An iterative row-action method for interval convex programming. J. Optim. Theory Appl. 34(3), 321–353 (1981)
Article MathSciNet MATH Google Scholar
Censor, Y., Zenios, S.A.: Proximal minimization algorithm with d-functions. J. Optim. Theory Appl. 73(3), 451–464 (1992)
Article MathSciNet MATH Google Scholar
Chang, C.-C., Lin, C.-J.: Libsvm: a library for support vector machines. ACM Trans. Intell. Syst. Technol. 2(3), 27 (2011)
Article Google Scholar
Chen, G., Teboulle, M.: Convergence analysis of a proximal-like minimization algorithm using Bregman functions. SIAM J. Optim. 3(3), 538–543 (1993)
Article MathSciNet MATH Google Scholar
Csiszár, I.: Why least squares and maximum entropy? An axiomatic approach to inference for linear inverse problems. Ann. Stat. 19(4), 2032–2066 (1991)
Article MathSciNet MATH Google Scholar
Dragomir, R.-A., Taylor, A.B., d’Aspremont, A., Bolte, J.: Optimal complexity and certification of bregman first-order methods. Preprint, arXiv:1911.08510 (2019)
Gutman, D.H., Peña, J.F.: Perturbed Fenchel duality and first-order methods. Preprint, arXiv:1812.10198 (2018)
Hanzely, F., Richtárik, P.: Fastest rates for stochastic mirror descent methods. arXiv preprint arXiv:1803.07374 (2018)
Hardy, G., Littlewood, J.E., Pólya, G.: Inequalities, 2nd edn. Cambridge University Press, Cambridge (1952)
MATH Google Scholar
Kiefer, J., Wolfowitz, J.: Optimal design in regression problems. Ann. Math. Stat. 30(2), 271–294 (1959)
Article MATH Google Scholar
Lu, H.: Relative-continuity for non-Lipschitz non-smooth convex optimization using stochastic (or deterministic) mirror descent. INFORMS Journal on Optimization 1(4), 288–303 (2019)
Article MathSciNet Google Scholar
Lu, H., Freund, R.M., Nesterov, Y.: Relatively smooth convex optimization by first-order methods, and applications. SIAM J. Optim. 28(1), 333–354 (2018)
Article MathSciNet MATH Google Scholar
Nemirovsky, A., Yudin, D.: Problem Complexity and Method Efficiency in Optimization. Wiley, New York (1983)
Google Scholar
Nesterov, Y.: A method for solving a convex programming problem with convergence rate O(1/k²). Sov. Math. Doklady 27(2), 372–376 (1983)
MATH Google Scholar
Nesterov, Y.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Èkonom. i. Mat. Metody 24, 509–517 (1988)
MathSciNet MATH Google Scholar
Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course. Kluwer, Boston (2004)
Book MATH Google Scholar
Nesterov, Y.: Smooth minimization of nonsmooth functions. Math. Program. 103, 127–152 (2005)
Article MathSciNet MATH Google Scholar
Nesterov, Y.: Gradient methods for minimizing composite functions. Math. Program. Ser. B 140, 125–161 (2013)
Article MathSciNet MATH Google Scholar
Nesterov, Y.: Universal gradient methods for convex optimization problems. Math. Program. Ser. A 152, 381–404 (2015)
Article MathSciNet MATH Google Scholar
Nesterov, Y.: Implementable tensor methods in unconstrained convex optimization. Math. Program. 186, 157–183 (2021)
Article MathSciNet MATH Google Scholar
O’Donoghue, B., Candès, E.: Adaptive restart for accelerated gradient schemes. Found. Comput. Math. 15(3), 715–732 (2015)
Article MathSciNet MATH Google Scholar
Rockafellar, R.T.: Convex Analysis. Princeton University Press, Princeton (1970)
Book MATH Google Scholar
Teboulle, M.: A simplified view of first order methods for optimization. Math. Program. Ser. B 170, 67–96 (2018)
Article MathSciNet MATH Google Scholar
Tseng, P.: On accelerated proximal gradient methods for convex-concave optimization (unpublished manuscript) (2008)
Zhou, Y., Liang, Y., Shen, L.: A simple convergence analysis of Bregman proximal gradient algorithm. Comput. Optim. Appl. 93, 903–912 (2019)
Article MathSciNet MATH Google Scholar

Download references

Acknowledgements

We thank Haihao Lu, Robert Freund and Yurii Nesterov for helpful conversations. We are also grateful to the anonymous referees whose comments helped improving the clarity of the paper. Peter Richtárik acknowledges the support of the KAUST Baseline Research Funding Scheme.

Author information

Filip Hanzely
Present address: Toyota Technological Institute at Chicago (TTIC), Chicago, Illinois, USA

Authors and Affiliations

Division of Computer, Electrical and Mathematical Sciences and Engineering (CEMSE), King Abdullah University of Science and Technology (KAUST), Thuwal, Kingdom of Saudi Arabia
Filip Hanzely & Peter Richtárik
Moscow Institute of Physics and Technology, Dolgoprudny, Russia
Peter Richtárik
Microsoft Research, Redmond, WA, USA
Lin Xiao

Authors

Filip Hanzely
View author publications
You can also search for this author in PubMed Google Scholar
Peter Richtárik
View author publications
You can also search for this author in PubMed Google Scholar
Lin Xiao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lin Xiao.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hanzely, F., Richtárik, P. & Xiao, L. Accelerated Bregman proximal gradient methods for relatively smooth convex optimization. Comput Optim Appl 79, 405–440 (2021). https://doi.org/10.1007/s10589-021-00273-8

Download citation

Received: 24 April 2020
Accepted: 05 March 2021
Published: 07 April 2021
Issue Date: June 2021
DOI: https://doi.org/10.1007/s10589-021-00273-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Accelerated Bregman proximal gradient methods for relatively smooth convex optimization

Abstract

Access this article

Similar content being viewed by others

Non-smooth Non-convex Bregman Minimization: Unification and New Algorithms

Generalized Nesterov’s accelerated proximal gradient algorithms with convergence rate of order o(1/k2)

Accelerated Gradient-Free Optimization Methods with a Non-Euclidean Proximal Operator

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Accelerated Bregman proximal gradient methods for relatively smooth convex optimization

Abstract

Access this article

Similar content being viewed by others

Non-smooth Non-convex Bregman Minimization: Unification and New Algorithms

Generalized Nesterov’s accelerated proximal gradient algorithms with convergence rate of order o(1/k2)

Accelerated Gradient-Free Optimization Methods with a Non-Euclidean Proximal Operator

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation