Faster subgradient methods for functions with Hölderian growth

Johnstone, Patrick R.; Moulin, Pierre

doi:10.1007/s10107-018-01361-0

Faster subgradient methods for functions with Hölderian growth

Full Length Paper
Series A
Published: 07 January 2019

Volume 180, pages 417–450, (2020)
Cite this article

Mathematical Programming Submit manuscript

Patrick R. Johnstone¹ &
Pierre Moulin²

1036 Accesses
9 Citations
2 Altmetric
Explore all metrics

Abstract

The purpose of this manuscript is to derive new convergence results for several subgradient methods applied to minimizing nonsmooth convex functions with Hölderian growth. The growth condition is satisfied in many applications and includes functions with quadratic growth and weakly sharp minima as special cases. To this end there are three main contributions. First, for a constant and sufficiently small stepsize, we show that the subgradient method achieves linear convergence up to a certain region including the optimal set, with error of the order of the stepsize. Second, if appropriate problem parameters are known, we derive a decaying stepsize which obtains a much faster convergence rate than is suggested by the classical \(O(1/\sqrt{k})\) result for the subgradient method. Thirdly we develop a novel “descending stairs” stepsize which obtains this faster convergence rate and also obtains linear convergence for the special case of weakly sharp functions. We also develop an adaptive variant of the “descending stairs” stepsize which achieves the same convergence rate without requiring an error bound constant which is difficult to estimate in practice.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Optimal subgradient algorithms for large-scale convex optimization in simple domains

Article Open access 14 March 2017

Two subgradient extragradient methods based on the golden ratio technique for solving variational inequality problems

Article 13 January 2024

A generalized extragradient method for variational inequalities of the second kind

Article 02 December 2023

Notes

Our analysis also holds for Goffin’s condition.
see [22] for a more detailed comparison with these alternative methods.

References

Agro, G.: Maximum likelihood and \(L_p\) norm estimators. Stat. Appl. 4(1), 7 (1992)
Google Scholar
Attouch, H., Bolte, J., Svaiter, B.F.: Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward–backward splitting, and regularized Gauss–Seidel methods. Math. Program. 137(1–2), 91–129 (2013)
Article MathSciNet MATH Google Scholar
Bauschke, H.H., Combettes, P.L.: Convex Analysis and Monotone Operator Theory in Hilbert Spaces. Springer, Berlin (2011)
Book MATH Google Scholar
Beck, A., Shtern, S.: Linearly convergent away-step conditional gradient for non-strongly convex functions. Math. Program. 164, 1–27 (2015)
Article MathSciNet MATH Google Scholar
Bertsekas, D.P.: Nonlinear Programming, 2nd edn. Athena Scientific, Nashua (1999)
MATH Google Scholar
Bolte, J., Daniilidis, A., Lewis, A.: The Łojasiewicz inequality for nonsmooth subanalytic functions with applications to subgradient dynamical systems. SIAM J. Optim. 17(4), 1205–1223 (2007)
Article MATH Google Scholar
Bolte, J., Nguyen, T.P., Peypouquet, J., Suter, B.W.: From error bounds to the complexity of first-order descent methods for convex functions. Math. Program. 165, 1–37 (2015)
MathSciNet MATH Google Scholar
Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 3(1), 1–122 (2011)
Article MATH Google Scholar
Burke, J., Deng, S.: Weak sharp minima revisited part i: basic theory. Control Cybern. 31, 439–469 (2002)
MATH Google Scholar
Burke, J., Ferris, M.C.: Weak sharp minima in mathematical programming. SIAM J. Control Optim. 31(5), 1340–1359 (1993)
Article MathSciNet MATH Google Scholar
Chambolle, A., Pock, T.: A first-order primal-dual algorithm for convex problems with applications to imaging. J. Math. Imaging Vis. 40(1), 120–145 (2011)
Article MathSciNet MATH Google Scholar
Cruz, J.Y.B.: On proximal subgradient splitting method for minimizing the sum of two nonsmooth convex functions. Set-Valued Var. Anal. 25(2), 245–263 (2017)
Article MathSciNet MATH Google Scholar
Davis, D., Yin, W.: A three-operator splitting scheme and its optimization applications. Set-Valued Var. Anal. 25(4), 829–858 (2017)
Article MathSciNet MATH Google Scholar
Ferris, M.C.: Finite termination of the proximal point algorithm. Math. Program. 50(1), 359–366 (1991)
Article MathSciNet MATH Google Scholar
Freund, R.M., Lu, H.: New computational guarantees for solving convex optimization problems with first order methods, via a function growth condition measure. Math. Program. 170, 1–33 (2015)
MathSciNet Google Scholar
Gao, X., Huang, J.: Asymptotic analysis of high-dimensional LAD regression with LASSO. Stat. Sin. 20, 1485–1506 (2010)
MathSciNet MATH Google Scholar
Gilpin, A., Pena, J., Sandholm, T.: First-order algorithm with \(O(\ln (1/\epsilon ))\) convergence for \(\epsilon \)-equilibrium in two-person zero-sum games. Math. Program. 133(1–2), 279–298 (2012)
Article MathSciNet MATH Google Scholar
Goffin, J.L.: On convergence rates of subgradient optimization methods. Math. Program. 13(1), 329–347 (1977)
Article MathSciNet MATH Google Scholar
Hare, W., Lewis, A.S.: Identifying active constraints via partial smoothness and prox-regularity. J. Convex Anal. 11(2), 251–266 (2004)
MathSciNet MATH Google Scholar
Hastie, T., Tibshirani, R., Friedman, J., Hastie, T., Friedman, J., Tibshirani, R.: The Elements of Statistical Learning. Springer, Berlin (2009)
Book MATH Google Scholar
Johnstone, P.R., Eckstein, J.: Projective splitting with forward steps: asynchronous and block-iterative operator splitting. arXiv:1803.07043 (2018)
Johnstone, P.R., Moulin, P.: Faster subgradient methods for functions with Hölderian growth. arXiv:1704.00196 (2017)
Karimi, H., Nutini, J., Schmidt, M.: Linear convergence of gradient and proximal-gradient methods under the Polyak–Łojasiewicz condition. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pp. 795–811. Springer (2016)
Kivinen, J., Smola, A.J., Williamson, R.C.: Online learning with kernels. IEEE Trans. Signal Process. 52(8), 2165–2176 (2004)
Article MathSciNet MATH Google Scholar
Li, G.: Global error bounds for piecewise convex polynomials. Math. Program. 137(1–2), 37–64 (2013)
Article MathSciNet MATH Google Scholar
Liang, J., Fadili, J., Peyré, G.: Activity identification and local linear convergence of forward–backward-type methods. SIAM J. Optim. 27(1), 408–437 (2017)
Article MathSciNet MATH Google Scholar
Lim, E.: On the convergence rate for stochastic approximation in the nonsmooth setting. Math. Oper. Res. 36(3), 527–537 (2011)
Article MathSciNet MATH Google Scholar
Luo, Z.Q., Tseng, P.: Error bounds and convergence analysis of feasible descent methods: a general approach. Ann. Oper. Res. 46(1), 157–178 (1993)
Article MathSciNet MATH Google Scholar
Maculan, N., Santiago, C.P., Macambira, E., Jardim, M.: An \(O(n)\) algorithm for projecting a vector on the intersection of a hyperplane and a box in \(\mathbb{R}^n\). J. Optim. Theory Appl. 117(3), 553–574 (2003)
Article MathSciNet MATH Google Scholar
Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. In: Stochastic Optimization: Algorithms and Applications, pp. 223–264. Springer (2001)
Nedić, A., Bertsekas, D.P.: The effect of deterministic noise in subgradient methods. Math. Program. 125(1), 75–99 (2010)
Article MathSciNet MATH Google Scholar
Nemirovski, A., Juditsky, A., Lan, G., Shapiro, A.: Robust stochastic approximation approach to stochastic programming. SIAM J. Optim. 19(4), 1574–1609 (2009)
Article MathSciNet MATH Google Scholar
Noll, D.: Convergence of non-smooth descent methods using the Kurdyka–Łojasiewicz inequality. J. Optim. Theory Appl. 160(2), 553–572 (2014)
Article MathSciNet MATH Google Scholar
Pang, J.S.: Error bounds in mathematical programming. Math. Program. 79(1–3), 299–332 (1997)
MathSciNet MATH Google Scholar
Poljak, B.: Nonlinear programming methods in the presence of noise. Math. Program. 14(1), 87–97 (1978)
Article MathSciNet MATH Google Scholar
Polyak, B.T.: Introduction to Optimization. Optimization Software Inc., New York (1987)
MATH Google Scholar
Renegar, J.: A framework for applying subgradient methods to conic optimization problems. arXiv:1503.02611 (2015)
Renegar, J.: “Efficient” subgradient methods for general convex optimization. SIAM J. Optim. 26(4), 2649–2676 (2016)
Article MathSciNet MATH Google Scholar
Rosenberg, E.: A geometrically convergent subgradient optimization method for nonlinearly constrained convex programs. Math. Oper. Res. 13(3), 512–523 (1988)
Article MathSciNet MATH Google Scholar
Shor, N.Z.: Minimization Methods for Non-differentiable Functions, vol. 3. Springer, Berlin (2012)
Google Scholar
Supittayapornpong, S., Neely, M.J.: Staggered time average algorithm for stochastic non-smooth optimization with \(O(1/T)\) convergence. arXiv:1607.02842 (2016)
Tseng, P.: Approximation accuracy, gradient methods, and error bound for structured convex optimization. Math. Program. 125(2), 263–295 (2010)
Article MathSciNet MATH Google Scholar
Wang, L.: The \(\ell _1\) penalized LAD estimator for high dimensional linear regression. J. Multivar. Anal. 120, 135–151 (2013)
Article Google Scholar
Wang, L., Gordon, M.D., Zhu, J.: Regularized least absolute deviations regression and an efficient algorithm for parameter tuning. In: Sixth International Conference on Data Mining, ICDM’06, 2006, pp. 690–700. IEEE (2006)
Wu, T.T., Lange, K.: Coordinate descent algorithms for lasso penalized regression. Ann. Appl. Stat. 2, 224–244 (2008)
Article MathSciNet MATH Google Scholar
Xu, Y., Lin, Q., Yang, T.: Accelerate stochastic subgradient method by leveraging local error bound. hyperimagehttp://arxiv.org/abs/1607.01027arXiv:1607.01027 (2016)
Yang, T., Lin, Q.: RSG: beating subgradient method without smoothness and strong convexity. arXiv:1512.03107 (2015)
Zhang, H.: New analysis of linear convergence of gradient-type methods via unifying error bound conditions. arXiv:1606.00269 (2016)
Zhang, H., Yin, W.: Gradient methods for convex minimization: better rates under weaker conditions. arXiv:1303.4645 (2013)
Zhou, Z., So, A.M.C.: A unified approach to error bounds for structured convex optimization problems. Math. Program. 165, 689–728 (2017)
Article MathSciNet MATH Google Scholar
Zhu, J., Rosset, S., Hastie, T., Tibshirani, R.: 1-norm support vector machines. In: NIPS, vol. 15, pp. 49–56 (2003)

Download references

Author information

Authors and Affiliations

Department of Management Sciences and Information Systems, Rutgers Business School Newark and New Brunswick, Rutgers University, New Brunswick, USA
Patrick R. Johnstone
Coordinated Science Laboratory, University of Illinois, Urbana, IL, 61801, USA
Pierre Moulin

Authors

Patrick R. Johnstone
View author publications
You can also search for this author in PubMed Google Scholar
Pierre Moulin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Patrick R. Johnstone.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Johnstone, P.R., Moulin, P. Faster subgradient methods for functions with Hölderian growth. Math. Program. 180, 417–450 (2020). https://doi.org/10.1007/s10107-018-01361-0

Download citation

Received: 29 May 2017
Accepted: 24 December 2018
Published: 07 January 2019
Issue Date: March 2020
DOI: https://doi.org/10.1007/s10107-018-01361-0

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Faster subgradient methods for functions with Hölderian growth

Abstract

Access this article

Similar content being viewed by others

Optimal subgradient algorithms for large-scale convex optimization in simple domains

Two subgradient extragradient methods based on the golden ratio technique for solving variational inequality problems

A generalized extragradient method for variational inequalities of the second kind

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Mathematics Subject Classification

Navigation

Faster subgradient methods for functions with Hölderian growth

Abstract

Access this article

Similar content being viewed by others

Optimal subgradient algorithms for large-scale convex optimization in simple domains

Two subgradient extragradient methods based on the golden ratio technique for solving variational inequality problems

A generalized extragradient method for variational inequalities of the second kind

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Mathematics Subject Classification

Search

Navigation