Abstract
The purpose of this manuscript is to derive new convergence results for several subgradient methods applied to minimizing nonsmooth convex functions with Hölderian growth. The growth condition is satisfied in many applications and includes functions with quadratic growth and weakly sharp minima as special cases. To this end there are three main contributions. First, for a constant and sufficiently small stepsize, we show that the subgradient method achieves linear convergence up to a certain region including the optimal set, with error of the order of the stepsize. Second, if appropriate problem parameters are known, we derive a decaying stepsize which obtains a much faster convergence rate than is suggested by the classical \(O(1/\sqrt{k})\) result for the subgradient method. Thirdly we develop a novel “descending stairs” stepsize which obtains this faster convergence rate and also obtains linear convergence for the special case of weakly sharp functions. We also develop an adaptive variant of the “descending stairs” stepsize which achieves the same convergence rate without requiring an error bound constant which is difficult to estimate in practice.
Similar content being viewed by others
Notes
Our analysis also holds for Goffin’s condition.
see [22] for a more detailed comparison with these alternative methods.
References
Agro, G.: Maximum likelihood and \(L_p\) norm estimators. Stat. Appl. 4(1), 7 (1992)
Attouch, H., Bolte, J., Svaiter, B.F.: Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward–backward splitting, and regularized Gauss–Seidel methods. Math. Program. 137(1–2), 91–129 (2013)
Bauschke, H.H., Combettes, P.L.: Convex Analysis and Monotone Operator Theory in Hilbert Spaces. Springer, Berlin (2011)
Beck, A., Shtern, S.: Linearly convergent away-step conditional gradient for non-strongly convex functions. Math. Program. 164, 1–27 (2015)
Bertsekas, D.P.: Nonlinear Programming, 2nd edn. Athena Scientific, Nashua (1999)
Bolte, J., Daniilidis, A., Lewis, A.: The Łojasiewicz inequality for nonsmooth subanalytic functions with applications to subgradient dynamical systems. SIAM J. Optim. 17(4), 1205–1223 (2007)
Bolte, J., Nguyen, T.P., Peypouquet, J., Suter, B.W.: From error bounds to the complexity of first-order descent methods for convex functions. Math. Program. 165, 1–37 (2015)
Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 3(1), 1–122 (2011)
Burke, J., Deng, S.: Weak sharp minima revisited part i: basic theory. Control Cybern. 31, 439–469 (2002)
Burke, J., Ferris, M.C.: Weak sharp minima in mathematical programming. SIAM J. Control Optim. 31(5), 1340–1359 (1993)
Chambolle, A., Pock, T.: A first-order primal-dual algorithm for convex problems with applications to imaging. J. Math. Imaging Vis. 40(1), 120–145 (2011)
Cruz, J.Y.B.: On proximal subgradient splitting method for minimizing the sum of two nonsmooth convex functions. Set-Valued Var. Anal. 25(2), 245–263 (2017)
Davis, D., Yin, W.: A three-operator splitting scheme and its optimization applications. Set-Valued Var. Anal. 25(4), 829–858 (2017)
Ferris, M.C.: Finite termination of the proximal point algorithm. Math. Program. 50(1), 359–366 (1991)
Freund, R.M., Lu, H.: New computational guarantees for solving convex optimization problems with first order methods, via a function growth condition measure. Math. Program. 170, 1–33 (2015)
Gao, X., Huang, J.: Asymptotic analysis of high-dimensional LAD regression with LASSO. Stat. Sin. 20, 1485–1506 (2010)
Gilpin, A., Pena, J., Sandholm, T.: First-order algorithm with \(O(\ln (1/\epsilon ))\) convergence for \(\epsilon \)-equilibrium in two-person zero-sum games. Math. Program. 133(1–2), 279–298 (2012)
Goffin, J.L.: On convergence rates of subgradient optimization methods. Math. Program. 13(1), 329–347 (1977)
Hare, W., Lewis, A.S.: Identifying active constraints via partial smoothness and prox-regularity. J. Convex Anal. 11(2), 251–266 (2004)
Hastie, T., Tibshirani, R., Friedman, J., Hastie, T., Friedman, J., Tibshirani, R.: The Elements of Statistical Learning. Springer, Berlin (2009)
Johnstone, P.R., Eckstein, J.: Projective splitting with forward steps: asynchronous and block-iterative operator splitting. arXiv:1803.07043 (2018)
Johnstone, P.R., Moulin, P.: Faster subgradient methods for functions with Hölderian growth. arXiv:1704.00196 (2017)
Karimi, H., Nutini, J., Schmidt, M.: Linear convergence of gradient and proximal-gradient methods under the Polyak–Łojasiewicz condition. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pp. 795–811. Springer (2016)
Kivinen, J., Smola, A.J., Williamson, R.C.: Online learning with kernels. IEEE Trans. Signal Process. 52(8), 2165–2176 (2004)
Li, G.: Global error bounds for piecewise convex polynomials. Math. Program. 137(1–2), 37–64 (2013)
Liang, J., Fadili, J., Peyré, G.: Activity identification and local linear convergence of forward–backward-type methods. SIAM J. Optim. 27(1), 408–437 (2017)
Lim, E.: On the convergence rate for stochastic approximation in the nonsmooth setting. Math. Oper. Res. 36(3), 527–537 (2011)
Luo, Z.Q., Tseng, P.: Error bounds and convergence analysis of feasible descent methods: a general approach. Ann. Oper. Res. 46(1), 157–178 (1993)
Maculan, N., Santiago, C.P., Macambira, E., Jardim, M.: An \(O(n)\) algorithm for projecting a vector on the intersection of a hyperplane and a box in \(\mathbb{R}^n\). J. Optim. Theory Appl. 117(3), 553–574 (2003)
Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. In: Stochastic Optimization: Algorithms and Applications, pp. 223–264. Springer (2001)
Nedić, A., Bertsekas, D.P.: The effect of deterministic noise in subgradient methods. Math. Program. 125(1), 75–99 (2010)
Nemirovski, A., Juditsky, A., Lan, G., Shapiro, A.: Robust stochastic approximation approach to stochastic programming. SIAM J. Optim. 19(4), 1574–1609 (2009)
Noll, D.: Convergence of non-smooth descent methods using the Kurdyka–Łojasiewicz inequality. J. Optim. Theory Appl. 160(2), 553–572 (2014)
Pang, J.S.: Error bounds in mathematical programming. Math. Program. 79(1–3), 299–332 (1997)
Poljak, B.: Nonlinear programming methods in the presence of noise. Math. Program. 14(1), 87–97 (1978)
Polyak, B.T.: Introduction to Optimization. Optimization Software Inc., New York (1987)
Renegar, J.: A framework for applying subgradient methods to conic optimization problems. arXiv:1503.02611 (2015)
Renegar, J.: “Efficient” subgradient methods for general convex optimization. SIAM J. Optim. 26(4), 2649–2676 (2016)
Rosenberg, E.: A geometrically convergent subgradient optimization method for nonlinearly constrained convex programs. Math. Oper. Res. 13(3), 512–523 (1988)
Shor, N.Z.: Minimization Methods for Non-differentiable Functions, vol. 3. Springer, Berlin (2012)
Supittayapornpong, S., Neely, M.J.: Staggered time average algorithm for stochastic non-smooth optimization with \(O(1/T)\) convergence. arXiv:1607.02842 (2016)
Tseng, P.: Approximation accuracy, gradient methods, and error bound for structured convex optimization. Math. Program. 125(2), 263–295 (2010)
Wang, L.: The \(\ell _1\) penalized LAD estimator for high dimensional linear regression. J. Multivar. Anal. 120, 135–151 (2013)
Wang, L., Gordon, M.D., Zhu, J.: Regularized least absolute deviations regression and an efficient algorithm for parameter tuning. In: Sixth International Conference on Data Mining, ICDM’06, 2006, pp. 690–700. IEEE (2006)
Wu, T.T., Lange, K.: Coordinate descent algorithms for lasso penalized regression. Ann. Appl. Stat. 2, 224–244 (2008)
Xu, Y., Lin, Q., Yang, T.: Accelerate stochastic subgradient method by leveraging local error bound. hyperimagehttp://arxiv.org/abs/1607.01027arXiv:1607.01027 (2016)
Yang, T., Lin, Q.: RSG: beating subgradient method without smoothness and strong convexity. arXiv:1512.03107 (2015)
Zhang, H.: New analysis of linear convergence of gradient-type methods via unifying error bound conditions. arXiv:1606.00269 (2016)
Zhang, H., Yin, W.: Gradient methods for convex minimization: better rates under weaker conditions. arXiv:1303.4645 (2013)
Zhou, Z., So, A.M.C.: A unified approach to error bounds for structured convex optimization problems. Math. Program. 165, 689–728 (2017)
Zhu, J., Rosset, S., Hastie, T., Tibshirani, R.: 1-norm support vector machines. In: NIPS, vol. 15, pp. 49–56 (2003)
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Johnstone, P.R., Moulin, P. Faster subgradient methods for functions with Hölderian growth. Math. Program. 180, 417–450 (2020). https://doi.org/10.1007/s10107-018-01361-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10107-018-01361-0