Skip to main content
Log in

Faster subgradient methods for functions with Hölderian growth

  • Full Length Paper
  • Series A
  • Published:
Mathematical Programming Submit manuscript

Abstract

The purpose of this manuscript is to derive new convergence results for several subgradient methods applied to minimizing nonsmooth convex functions with Hölderian growth. The growth condition is satisfied in many applications and includes functions with quadratic growth and weakly sharp minima as special cases. To this end there are three main contributions. First, for a constant and sufficiently small stepsize, we show that the subgradient method achieves linear convergence up to a certain region including the optimal set, with error of the order of the stepsize. Second, if appropriate problem parameters are known, we derive a decaying stepsize which obtains a much faster convergence rate than is suggested by the classical \(O(1/\sqrt{k})\) result for the subgradient method. Thirdly we develop a novel “descending stairs” stepsize which obtains this faster convergence rate and also obtains linear convergence for the special case of weakly sharp functions. We also develop an adaptive variant of the “descending stairs” stepsize which achieves the same convergence rate without requiring an error bound constant which is difficult to estimate in practice.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Notes

  1. Our analysis also holds for Goffin’s condition.

  2. see [22] for a more detailed comparison with these alternative methods.

References

  1. Agro, G.: Maximum likelihood and \(L_p\) norm estimators. Stat. Appl. 4(1), 7 (1992)

    Google Scholar 

  2. Attouch, H., Bolte, J., Svaiter, B.F.: Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward–backward splitting, and regularized Gauss–Seidel methods. Math. Program. 137(1–2), 91–129 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  3. Bauschke, H.H., Combettes, P.L.: Convex Analysis and Monotone Operator Theory in Hilbert Spaces. Springer, Berlin (2011)

    Book  MATH  Google Scholar 

  4. Beck, A., Shtern, S.: Linearly convergent away-step conditional gradient for non-strongly convex functions. Math. Program. 164, 1–27 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  5. Bertsekas, D.P.: Nonlinear Programming, 2nd edn. Athena Scientific, Nashua (1999)

    MATH  Google Scholar 

  6. Bolte, J., Daniilidis, A., Lewis, A.: The Łojasiewicz inequality for nonsmooth subanalytic functions with applications to subgradient dynamical systems. SIAM J. Optim. 17(4), 1205–1223 (2007)

    Article  MATH  Google Scholar 

  7. Bolte, J., Nguyen, T.P., Peypouquet, J., Suter, B.W.: From error bounds to the complexity of first-order descent methods for convex functions. Math. Program. 165, 1–37 (2015)

    MathSciNet  MATH  Google Scholar 

  8. Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 3(1), 1–122 (2011)

    Article  MATH  Google Scholar 

  9. Burke, J., Deng, S.: Weak sharp minima revisited part i: basic theory. Control Cybern. 31, 439–469 (2002)

    MATH  Google Scholar 

  10. Burke, J., Ferris, M.C.: Weak sharp minima in mathematical programming. SIAM J. Control Optim. 31(5), 1340–1359 (1993)

    Article  MathSciNet  MATH  Google Scholar 

  11. Chambolle, A., Pock, T.: A first-order primal-dual algorithm for convex problems with applications to imaging. J. Math. Imaging Vis. 40(1), 120–145 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  12. Cruz, J.Y.B.: On proximal subgradient splitting method for minimizing the sum of two nonsmooth convex functions. Set-Valued Var. Anal. 25(2), 245–263 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  13. Davis, D., Yin, W.: A three-operator splitting scheme and its optimization applications. Set-Valued Var. Anal. 25(4), 829–858 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  14. Ferris, M.C.: Finite termination of the proximal point algorithm. Math. Program. 50(1), 359–366 (1991)

    Article  MathSciNet  MATH  Google Scholar 

  15. Freund, R.M., Lu, H.: New computational guarantees for solving convex optimization problems with first order methods, via a function growth condition measure. Math. Program. 170, 1–33 (2015)

    MathSciNet  Google Scholar 

  16. Gao, X., Huang, J.: Asymptotic analysis of high-dimensional LAD regression with LASSO. Stat. Sin. 20, 1485–1506 (2010)

    MathSciNet  MATH  Google Scholar 

  17. Gilpin, A., Pena, J., Sandholm, T.: First-order algorithm with \(O(\ln (1/\epsilon ))\) convergence for \(\epsilon \)-equilibrium in two-person zero-sum games. Math. Program. 133(1–2), 279–298 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  18. Goffin, J.L.: On convergence rates of subgradient optimization methods. Math. Program. 13(1), 329–347 (1977)

    Article  MathSciNet  MATH  Google Scholar 

  19. Hare, W., Lewis, A.S.: Identifying active constraints via partial smoothness and prox-regularity. J. Convex Anal. 11(2), 251–266 (2004)

    MathSciNet  MATH  Google Scholar 

  20. Hastie, T., Tibshirani, R., Friedman, J., Hastie, T., Friedman, J., Tibshirani, R.: The Elements of Statistical Learning. Springer, Berlin (2009)

    Book  MATH  Google Scholar 

  21. Johnstone, P.R., Eckstein, J.: Projective splitting with forward steps: asynchronous and block-iterative operator splitting. arXiv:1803.07043 (2018)

  22. Johnstone, P.R., Moulin, P.: Faster subgradient methods for functions with Hölderian growth. arXiv:1704.00196 (2017)

  23. Karimi, H., Nutini, J., Schmidt, M.: Linear convergence of gradient and proximal-gradient methods under the Polyak–Łojasiewicz condition. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pp. 795–811. Springer (2016)

  24. Kivinen, J., Smola, A.J., Williamson, R.C.: Online learning with kernels. IEEE Trans. Signal Process. 52(8), 2165–2176 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  25. Li, G.: Global error bounds for piecewise convex polynomials. Math. Program. 137(1–2), 37–64 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  26. Liang, J., Fadili, J., Peyré, G.: Activity identification and local linear convergence of forward–backward-type methods. SIAM J. Optim. 27(1), 408–437 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  27. Lim, E.: On the convergence rate for stochastic approximation in the nonsmooth setting. Math. Oper. Res. 36(3), 527–537 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  28. Luo, Z.Q., Tseng, P.: Error bounds and convergence analysis of feasible descent methods: a general approach. Ann. Oper. Res. 46(1), 157–178 (1993)

    Article  MathSciNet  MATH  Google Scholar 

  29. Maculan, N., Santiago, C.P., Macambira, E., Jardim, M.: An \(O(n)\) algorithm for projecting a vector on the intersection of a hyperplane and a box in \(\mathbb{R}^n\). J. Optim. Theory Appl. 117(3), 553–574 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  30. Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. In: Stochastic Optimization: Algorithms and Applications, pp. 223–264. Springer (2001)

  31. Nedić, A., Bertsekas, D.P.: The effect of deterministic noise in subgradient methods. Math. Program. 125(1), 75–99 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  32. Nemirovski, A., Juditsky, A., Lan, G., Shapiro, A.: Robust stochastic approximation approach to stochastic programming. SIAM J. Optim. 19(4), 1574–1609 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  33. Noll, D.: Convergence of non-smooth descent methods using the Kurdyka–Łojasiewicz inequality. J. Optim. Theory Appl. 160(2), 553–572 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  34. Pang, J.S.: Error bounds in mathematical programming. Math. Program. 79(1–3), 299–332 (1997)

    MathSciNet  MATH  Google Scholar 

  35. Poljak, B.: Nonlinear programming methods in the presence of noise. Math. Program. 14(1), 87–97 (1978)

    Article  MathSciNet  MATH  Google Scholar 

  36. Polyak, B.T.: Introduction to Optimization. Optimization Software Inc., New York (1987)

    MATH  Google Scholar 

  37. Renegar, J.: A framework for applying subgradient methods to conic optimization problems. arXiv:1503.02611 (2015)

  38. Renegar, J.: “Efficient” subgradient methods for general convex optimization. SIAM J. Optim. 26(4), 2649–2676 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  39. Rosenberg, E.: A geometrically convergent subgradient optimization method for nonlinearly constrained convex programs. Math. Oper. Res. 13(3), 512–523 (1988)

    Article  MathSciNet  MATH  Google Scholar 

  40. Shor, N.Z.: Minimization Methods for Non-differentiable Functions, vol. 3. Springer, Berlin (2012)

    Google Scholar 

  41. Supittayapornpong, S., Neely, M.J.: Staggered time average algorithm for stochastic non-smooth optimization with \(O(1/T)\) convergence. arXiv:1607.02842 (2016)

  42. Tseng, P.: Approximation accuracy, gradient methods, and error bound for structured convex optimization. Math. Program. 125(2), 263–295 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  43. Wang, L.: The \(\ell _1\) penalized LAD estimator for high dimensional linear regression. J. Multivar. Anal. 120, 135–151 (2013)

    Article  Google Scholar 

  44. Wang, L., Gordon, M.D., Zhu, J.: Regularized least absolute deviations regression and an efficient algorithm for parameter tuning. In: Sixth International Conference on Data Mining, ICDM’06, 2006, pp. 690–700. IEEE (2006)

  45. Wu, T.T., Lange, K.: Coordinate descent algorithms for lasso penalized regression. Ann. Appl. Stat. 2, 224–244 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  46. Xu, Y., Lin, Q., Yang, T.: Accelerate stochastic subgradient method by leveraging local error bound. hyperimagehttp://arxiv.org/abs/1607.01027arXiv:1607.01027 (2016)

  47. Yang, T., Lin, Q.: RSG: beating subgradient method without smoothness and strong convexity. arXiv:1512.03107 (2015)

  48. Zhang, H.: New analysis of linear convergence of gradient-type methods via unifying error bound conditions. arXiv:1606.00269 (2016)

  49. Zhang, H., Yin, W.: Gradient methods for convex minimization: better rates under weaker conditions. arXiv:1303.4645 (2013)

  50. Zhou, Z., So, A.M.C.: A unified approach to error bounds for structured convex optimization problems. Math. Program. 165, 689–728 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  51. Zhu, J., Rosset, S., Hastie, T., Tibshirani, R.: 1-norm support vector machines. In: NIPS, vol. 15, pp. 49–56 (2003)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Patrick R. Johnstone.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Johnstone, P.R., Moulin, P. Faster subgradient methods for functions with Hölderian growth. Math. Program. 180, 417–450 (2020). https://doi.org/10.1007/s10107-018-01361-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10107-018-01361-0

Mathematics Subject Classification

Navigation