Skip to main content

Bundle-level type methods uniformly optimal for smooth and nonsmooth convex optimization

Abstract

The main goal of this paper is to develop uniformly optimal first-order methods for convex programming (CP). By uniform optimality we mean that the first-order methods themselves do not require the input of any problem parameters, but can still achieve the best possible iteration complexity bounds. By incorporating a multi-step acceleration scheme into the well-known bundle-level method, we develop an accelerated bundle-level method, and show that it can achieve the optimal complexity for solving a general class of black-box CP problems without requiring the input of any smoothness information, such as, whether the problem is smooth, nonsmooth or weakly smooth, as well as the specific values of Lipschitz constant and smoothness level. We then develop a more practical, restricted memory version of this method, namely the accelerated prox-level (APL) method. We investigate the generalization of the APL method for solving certain composite CP problems and an important class of saddle-point problems recently studied by Nesterov (Math Program 103:127–152, 2005). We present promising numerical results for these new bundle-level methods applied to solve certain classes of semidefinite programming and stochastic programming problems.

This is a preview of subscription content, access via your institution.

References

  1. Ahmed, S.: Smooth Minimization of Two-Stage Stochastic Linear Programs. Manuscript, Georgia Institute of Technology (2006)

  2. Auslender, A., Teboulle, M.: Interior gradient and proximal methods for convex and conic optimization. SIAM J. Optim. 16, 697–725 (2006)

    Article  MATH  MathSciNet  Google Scholar 

  3. Becker, S., Bobin, J., Candès, E.: NESTA: A Fast and Accurate First-Order Method for Sparse Recovery. Manuscript, California Institute of Technology (2009)

  4. Ben-Tal, A., Nemirovski, A.S.: Lectures on Modern Convex Optimization: Analysis, Algorithms. Engineering Applications. MPS-SIAM Series on Optimization. SIAM, Philadelphia (2000)

  5. Ben-Tal, A., Nemirovski, A.S.: Non-euclidean restricted memory level method for large-scale convex optimization. Math. Program. 102, 407–456 (2005)

    Article  MATH  MathSciNet  Google Scholar 

  6. Chenny, E.W., Goldstein, A.A.: Newton’s methods for convex programming and tchebytcheff approximation. Numer. Math. 1, 253–268 (1959)

    Article  MathSciNet  Google Scholar 

  7. d’Aspremont, A., Banerjee, O., El Ghaoui, L.: First-order methods for sparse covariance selection. SIAM J. Matrix Anal. Appl. 30, 56–66 (2008)

    Article  MATH  MathSciNet  Google Scholar 

  8. Devolder, O., Glineur, F., Nesterov, Y.E.: First-Order Methods of Smooth Convex Optimization with Inexact Oracle. Manuscript, CORE, Université catholique de Louvain, Louvain-la-Neuve, Belgium (Dec 2010)

  9. Ghadimi, S., Lan, G.: Optimal stochastic approximation algorithms for strongly convex stochastic composite optimization, II: shrinking procedures and optimal algorithms. Technical report, SIAM J. Optim. (to appear) (2010)

  10. Ghadimi, S., Lan, G.: Optimal stochastic approximation algorithms for strongly convex stochastic composite optimization, I: a generic algorithmic framework. SIAM J. Optim. 22, 1469–1492 (2012)

    Article  MATH  MathSciNet  Google Scholar 

  11. Helmberg, C., Rendl, F.: A spectral bundle method for semidefinite programming. SIAM J. Optim. 10, 673–696 (2000)

    Article  MATH  MathSciNet  Google Scholar 

  12. Hoffman, A.: On approximate solutions of systems of linear inequalities. J. Res. Natl. Bureau Stand Sect. B Math. Sci. 49, 263–265 (1952)

    Article  Google Scholar 

  13. Juditsky, A., Nemirovski, A.S., Tauvel, C.: Solving Variational Inequalities with Stochastic Mirror-Prox Algorithm. Manuscript, Georgia Institute of Technology, Atlanta, GA (2008)

  14. Kelley, J.E.: The cutting plane method for solving convex programs. J. SIAM 8, 703–712 (1960)

    MathSciNet  Google Scholar 

  15. Kiwiel, K.C.: An aggregate subgradient method for nonsmooth convex minimization. Math. Program. 27, 320–341 (1983)

    Article  MATH  MathSciNet  Google Scholar 

  16. Kiwiel, K.C.: Proximity control in bundle methods for convex nondifferentiable minimization. Math. Program. 46, 105–122 (1990)

    Article  MATH  MathSciNet  Google Scholar 

  17. Kiwiel, K.C.: Proximal level bundle method for convex nondifferentable optimization, saddle point problems and variational inequalities. Math. Program. Ser. B 69, 89–109 (1995)

    MATH  MathSciNet  Google Scholar 

  18. Lan, G., Lu, Z., Monteiro, R.D.C.: Primal-dual first-order methods with \({\cal O}(1/\epsilon )\) iteration-complexity for cone programming. Math. Program. 126, 1–29 (2011)

    Article  MATH  MathSciNet  Google Scholar 

  19. Lan, G.: An optimal method for stochastic composite optimization. Math. Program. 133(1), 365–397 (2012)

    Article  MATH  MathSciNet  Google Scholar 

  20. Lan, G., Nemirovski, A.S., Shapiro, A.: Validation analysis of mirror descent stochastic approximation method. Math. Program. 134, 425–458 (2012)

    Article  MATH  MathSciNet  Google Scholar 

  21. Lemaréchal, C.: An extension of davidon methods to non-differentiable problems. Math. Program. Study 3, 95–109 (1975)

    Article  Google Scholar 

  22. Lemaréchal, C., Nemirovski, A.S., Nesterov, Y.E.: New variants of bundle methods. Math. Program. 69, 111–148 (1995)

    Article  MATH  Google Scholar 

  23. Lewis, A.S., Wright, S.J.: A Proximal Method for Composite Minimization. Manuscript, Cornell University, Ithaca, NY (2009)

  24. Linderoth, J., Wright, S.: Decomposition algorithms for stochastic programming on a computational grid. Comput. Optim. Appl. 24, 207–250 (2003)

    Article  MATH  MathSciNet  Google Scholar 

  25. Linderoth, J., Shapiro, A., Wright, S.: The empirical behavior of sampling methods for stochastic programming. Ann. Oper. Res. 142, 215–241 (2006)

    Article  MATH  MathSciNet  Google Scholar 

  26. Lu, Z.: Smooth optimization approach for sparse covariance selection. SIAM J. Optim. 19, 1807–1827 (2009)

    Article  MATH  Google Scholar 

  27. Mak, W.K., Morton, D.P., Wood, R.K.: Monte carlo bounding techniques for determining solution quality in stochastic programs. Oper. Res. Lett. 24, 47–56 (1999)

    Article  MATH  MathSciNet  Google Scholar 

  28. Mosek. The Mosek Optimization Toolbox for Matlab Manual. Version 6.0 (revision 93). http://www.mosek.com

  29. Nemirovski, A.S., Yudin, D.: Problem Complexity and Method Efficiency in Optimization. Wiley-Interscience Series in Discrete Mathematics. Wiley, XV (1983)

  30. Nemirovski, A.S.: Efficient Methods in Convex Programming. Lecture notes, Technion (1994)

  31. Nemirovski, A.S.: Prox-method with rate of convergence \(o(1/t)\) for variational inequalities with lipschitz continuous monotone operators and smooth convex-concave saddle point problems. SIAM J. Optim. 15, 229–251 (2005)

    Article  MathSciNet  Google Scholar 

  32. Nemirovski, A.S., Juditsky, A., Lan, G., Shapiro, A.: Robust stochastic approximation approach to stochastic programming. SIAM J. Optim. 19, 1574–1609 (2009)

    Article  MATH  MathSciNet  Google Scholar 

  33. Nemirovskii, A.S., Nesterov, Y.E.: Optimal methods for smooth convex minimization. Zh. Vichisl. Mat. Fiz. 25, 356–369 (1985). (In Russian)

    MathSciNet  Google Scholar 

  34. Nesterov, Y.E.: Efficient Methods in Nonlinear Programming. Radio i Sviaz, Moscow (1989)

  35. Nesterov, Y.E.: Gradient methods for minimizing composite objective functions. Technical report, Center for Operations Research and Econometrics (CORE), Catholic University of Louvain, September (2007)

  36. Nesterov, Y.E.: A method for unconstrained convex minimization problem with the rate of convergence \(O(1/k^2)\). Doklady AN SSSR 269, 543–547 (1983)

    MathSciNet  Google Scholar 

  37. Nesterov, Y.E.: On an approach to the construction of optimal methods of minimization of smooth convex functions. Ekonomo. i. Mat. Metody 24, 509–517 (1988)

    MATH  MathSciNet  Google Scholar 

  38. Nesterov, Y.E.: Introductory Lectures on Convex Optimization: A Basic Course. Kluwer, Massachusetts (2004)

    Book  Google Scholar 

  39. Nesterov, Y.E.: Excessive gap technique in nonsmooth convex minimization. SIAM J. Optim. 16, 235–249 (2005)

    Article  MATH  MathSciNet  Google Scholar 

  40. Nesterov, Y.E.: Smooth minimization of nonsmooth functions. Math. Program. 103, 127–152 (2005)

    Article  MATH  MathSciNet  Google Scholar 

  41. Nesterov, Y.E.: Smoothing technique and its applications in semidefinite optimization. Math. Program. 110, 245–259 (2007)

    Article  MATH  MathSciNet  Google Scholar 

  42. Oliveira, W., Sagastizábal, C., Scheimberg, S.: Inexact bundle methods for two-stage stochastic programming. SIAM J. Optim. 21(2), 517–544 (2011)

    Article  MATH  MathSciNet  Google Scholar 

  43. Peña, J.: Nash equilibria computation via smoothing techniques. Optima 78, 12–13 (2008)

    Google Scholar 

  44. Richtárik, P.: Approximate level method for nonsmooth convex minimization. J. Optim. Theory Appl. 152(2), 334–350 (2012)

    Google Scholar 

  45. Ruszczyński, A.: Nonlinear Optimization, 1st edn. Princeton University Press, Princeton (2006)

    MATH  Google Scholar 

  46. Sagastizábal, C.: Nonsmooth Optimization: Thinking Outside of the Black Box. SIAG/OPT Views-and-News, pp. 1–9 (2011)

  47. Sen, S., Doverspike, R.D., Cosares, S.: Network planning with random demand. Telecommun. Syst. 3, 11–30 (1994)

    Article  Google Scholar 

  48. Tseng, P.: On Accelerated Proximal Gradient Methods for Convex-Concave Optimization. Manuscript, University of Washington, Seattle (May 2008)

Download references

Acknowledgments

The author is very grateful to the co-editor Professor Adrian Lewis, the associate editor and two anonymous referees for their very useful suggestions for improving the quality and exposition of the paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Guanghui Lan.

Additional information

The paper is a combined version of the two manuscripts previously submitted to Mathematical Programming, namely: “Bundle-type methods uniformly optimal for smooth and nonsmooth convex optimization” and “Level methods uniformly optimal for composite and structured nonsmooth convex optimization”.

The author of this paper was partially supported by NSF grants CMMI-1000347 and DMS-1319050, ONR grant N00014-13-1-0036, and NSF CAREER Award CMMI-1254446.

Appendix

Appendix

In this section, we provide the proof of Lemma 8.

Let \(F\) and \(F_\eta \) be defined in (4.11) and (4.13), respectively. Also let us denote, for any \(\eta > 0\) and \(x \in X\),

$$\begin{aligned} \psi _x(z) := F_\eta (x) + \langle \nabla F_\eta (x), z - x \rangle + \frac{{{\mathcal {L}}}_\eta }{2} \Vert z - x\Vert ^2 + \eta {{\mathcal {D}}}_v, \end{aligned}$$
(7.1)

where \({{\mathcal {D}}}_v\) and \({{\mathcal {L}}}_\eta \) are defined in (3.7) and (4.15), respectively. Clearly, in view of (1.2) and (4.16), \(\psi _x\) is a majorant of both \(F_\eta \) and \(f\). Also let us define

$$\begin{aligned} Z_x {:=} \left\{ z \in {\mathbb {R}}^n: \Vert z - x\Vert ^2 = \frac{2}{{{\mathcal {L}}}_\eta } \left[ \eta {{\mathcal {D}}}_v + F_\eta (x) - F(x) \right] \right\} \!. \end{aligned}$$
(7.2)

Clearly, by the first relation in (4.16), we have

$$\begin{aligned} \Vert z-x\Vert ^2 \le \frac{2 \eta {{\mathcal {D}}}_v}{{{\mathcal {L}}}_\eta }, \ \ \forall \, z \in Z_x. \end{aligned}$$
(7.3)

Moreover, we can easily check that, for any \(z \in Z_x\),

$$\begin{aligned} \psi _x(z) + \langle \nabla \psi _x(z), x - z \rangle = F(x), \end{aligned}$$
(7.4)

where \(\nabla \psi _x(z) = \nabla F_\eta (x) + {{\mathcal {L}}}_\eta (z-x)\).

The following results provides the characterization of a subgradient direction of \(F\).

Lemma 9

Let \(x \in {\mathbb {R}}^n\) and \(p \in {\mathbb {R}}^n\) be given. Then, \(\exists z \in Z_x\) such that

$$\begin{aligned} \langle F'(x), p \rangle \le \langle \nabla \psi _x(z), p\rangle = \langle \nabla F_\eta (x) + {{\mathcal {L}}}_\eta (z-x), p \rangle . \end{aligned}$$

where \(F'(x) \in \partial F(x)\).

Proof

Let us denote

$$\begin{aligned} t = \frac{1}{\Vert p\Vert }\left\{ \frac{2}{{{\mathcal {L}}}_\eta } \left[ \eta {{\mathcal {D}}}_v + F_\eta (x) - F(x) \right] \right\} ^\frac{1}{2} \end{aligned}$$

and \(z_0 = x + t p\). Clearly, in view of (7.2), we have \(z_0 \in Z_x\). By convexity of \(F\) and (7.4), we have

$$\begin{aligned} F(x) + \langle F'(x), tp \rangle \le F(x + tp)&= \psi _x(z_0) = F(x) + \langle \nabla \psi _x(z_0), z_0 - x\rangle \\&= F(x) + t \langle \nabla \psi _x(z_0), p\rangle , \end{aligned}$$

which clearly implies the result.\(\square \)

We are now ready to prove Lemma 8.

Proof of Lemma 8

First note that by the convexity of \(F\), we have

$$\begin{aligned} F(x_0) - \left[ F(x_1) + \langle F'(x_1), x_0 - x_1\right] \rangle \le \langle F'(x_0), x_0 - x_1 \rangle + \langle F'(x_1), x_1 - x_0 \rangle . \end{aligned}$$

Moreover, by Lemma 9, \(\exists z_0 \in Z_{x_0}\) and \(z_1 \in Z_{x_1}\) s.t.

$$\begin{aligned}&\langle F'(x_0), x_0 - x_1 \rangle + \langle F'(x_1), x_1 - x_0 \rangle \\&\quad \le \langle \nabla F_\eta (x_0) - \nabla F_\eta (x_1), x_0 - x_1 \rangle + {{\mathcal {L}}}_\eta \langle z_0 - x_0 - (z_1 - x_1), x_0 - x_1\rangle \\&\quad \le {{\mathcal {L}}}_\eta \Vert x_0 - x_1\Vert ^2 + {{\mathcal {L}}}_\eta (\Vert z_0 - x_0\Vert + \Vert z_1-x_1\Vert ) \Vert x_0 - x_1\Vert \\&\quad \le {{\mathcal {L}}}_\eta \Vert x_0 - x_1\Vert ^2 + 2 {{\mathcal {L}}}_\eta \left( \frac{2 \eta {{\mathcal {D}}}_v}{{{\mathcal {L}}}_\eta } \right) ^\frac{1}{2} \Vert x_0 - x_1\Vert \\&\quad = \frac{\Vert A\Vert ^2}{\sigma _v \eta } \Vert x_0 - x_1\Vert ^2 + 2 \left( \frac{2 \Vert A\Vert ^2 {{\mathcal {D}}}_v}{\sigma _v}\right) ^\frac{1}{2} \Vert x_0 - x_1\Vert , \end{aligned}$$

where the last inequality and equality follow from (7.3) and (4.15), respectively. Combining the above two relations, we have

$$\begin{aligned} F(x_0)&- \left[ F(x_1) + \langle F'(x_1), x_0 - x_1 \rangle \right] \le \frac{\Vert A\Vert ^2}{\sigma _v \eta } \Vert x_0 - x_1\Vert ^2\\&+ 2 \left( \frac{2 \Vert A\Vert ^2 {{\mathcal {D}}}_v}{\sigma _v}\right) ^\frac{1}{2} \Vert x_0 - x_1\Vert . \end{aligned}$$

The result now follows by tending \(\eta \) to \(+\infty \) in the above relation.\(\square \)

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Lan, G. Bundle-level type methods uniformly optimal for smooth and nonsmooth convex optimization. Math. Program. 149, 1–45 (2015). https://doi.org/10.1007/s10107-013-0737-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10107-013-0737-x

Keywords

  • Convex programming
  • Complexity
  • Bundle-level
  • Optimal methods

Mathematics Subject Classification

  • 62L20
  • 90C25
  • 90C15
  • 68Q25