Skip to main content
Log in

An optimal variant of Kelley’s cutting-plane method

  • Full Length Paper
  • Series A
  • Published:
Mathematical Programming Submit manuscript

Abstract

We propose a new variant of Kelley’s cutting-plane method for minimizing a nonsmooth convex Lipschitz-continuous function over the Euclidean space. We derive the method through a constructive approach and prove that it attains the optimal rate of convergence for this class of problems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

Notes

  1. In order to avoid overly numerous special cases, we adopt the convention \(\frac{0}{0}=0\).

  2. Note that since both problems admit a compact feasible set, attainment of both values is warranted.

References

  1. Auslender, A.: Numerical methods for nondifferentiable convex optimization. In: Cornet, B., Nguyen, V., Vial, J. (eds.) Nonlinear Analysis and Optimization, Mathematical Programming Studies, vol. 30, pp. 102–126. Springer, Berlin (1987). doi:10.1007/BFb0121157

    Chapter  Google Scholar 

  2. Auslender, A., Teboulle, M.: Interior gradient and epsilon-subgradient descent methods for constrained convex minimization. Math. Oper. Res. 29(1), 1–26 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  3. Ben-Tal, A., Nemirovski, A.: Non-euclidean restricted memory level method for large-scale convex optimization. Math. Progr. 102(3), 407–456 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  4. Ben-Tal, A., Nemirovskii, A.S.: Lectures on Modern Convex Optimization. Siam, Philadelphia (2001)

    Book  Google Scholar 

  5. Benders, J.F.: Partitioning procedures for solving mixed-variables programming problems. Numer. Math. 4(1), 238–252 (1962)

    Article  MathSciNet  MATH  Google Scholar 

  6. Cheney, E.W., Goldstein, A.A.: Newton’s method for convex programming and Tchebycheff approximation. Numer. Math. 1(1), 253–268 (1959)

    Article  MathSciNet  MATH  Google Scholar 

  7. de Oliveira, W., Sagastizábal, C.: Bundle Methods in the XXIst Century: A Birds-Eye View. Optimization Online Report 4088 (2013)

  8. Drori, Y., Teboulle, M.: Performance of first-order methods for smooth convex minimization: a novel approach. Math. Progr. Ser. A 145, 451–482 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  9. Fan, K.: Minimax theorems. Proc. Natl. Acad. Sci. USA 39(1), 42 (1953)

    Article  MathSciNet  MATH  Google Scholar 

  10. Grone, R., Johnson, C.R., Sá, E.M., Wolkowicz, H.: Positive definite completions of partial hermitian matrices. Linear Algebr. Appl. 58, 109–124 (1984)

    Article  MathSciNet  MATH  Google Scholar 

  11. Kelley Jr, J.E.: The cutting-plane method for solving convex programs. J. Soc. Ind. Appl. Math. 8(4), 703–712 (1960)

    Article  MathSciNet  MATH  Google Scholar 

  12. Kim, D., Fessler, J.A.: Optimized first-order methods for smooth convex minimization. arXiv:1406.5468 (2014)

  13. Kiwiel, K.C.: Proximity control in bundle methods for convex nondifferentiable minimization. Math. Progr. 46(1–3), 105–122 (1990)

    Article  MathSciNet  MATH  Google Scholar 

  14. Kiwiel, K.C.: Proximal level bundle methods for convex nondifferentiable optimization, saddle-point problems and variational inequalities. Math. Progr. 69(1–3), 89–109 (1995)

    MathSciNet  MATH  Google Scholar 

  15. Kiwiel, K.C.: Efficiency of proximal bundle methods. J. Optim. Theory Appl. 104(3), 589–603 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  16. Lemaréchal, C.: An extension of davidon methods to non differentiable problems. In: Nondifferentiable Optimization, pp. 95–109. Springer, Berlin (1975)

  17. Lemaréchal, C., Nemirovskii, A., Nesterov, Y.: New variants of bundle methods. Math. Progr. 69(1–3), 111–147 (1995)

    Article  MathSciNet  MATH  Google Scholar 

  18. Lemaréchal, C., Sagastizábal, C.: Variable metric bundle methods: from conceptual to implementable forms. Math. Progr. 76(3), 393–410 (1997)

    Article  MathSciNet  MATH  Google Scholar 

  19. Lukšan, L., Vlček, J.: A Bundle–Newton method for nonsmooth unconstrained minimization. Math. Progr. 83(1–3), 373–391 (1998)

    MathSciNet  MATH  Google Scholar 

  20. Mäkelä, M.: Survey of bundle methods for nonsmooth optimization. Optim. Methods Softw. 17(1), 1–29 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  21. Nemirovsky, A.S., Yudin, D.B.: Problem Complexity and Method Efficiency in Optimization. A Wiley-Interscience Publication. Wiley, New York (1983) (Translated from the Russian and with a preface by E. R. Dawson, Wiley-Interscience Series in Discrete Mathematics)

  22. Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course. Applied Optimization. Kluwer Academic Publishers, Dordrecht (2004)

    Book  MATH  Google Scholar 

  23. Schramm, H., Zowe, J.: A version of the bundle idea for minimizing a nonsmooth function: conceptual idea, convergence analysis, numerical results. SIAM J. Optim. 2(1), 121–152 (1992)

    Article  MathSciNet  MATH  Google Scholar 

  24. Wolfe, P.: A method of conjugate subgradients for minimizing nondifferentiable functions. In: Nondifferentiable Optimization, pp. 145–173. Springer, Berlin (1975)

Download references

Acknowledgments

We thank the two referees and the associate editor for their constructive comments and useful suggestions.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Marc Teboulle.

Additional information

This research was partially supported by the Israel Science Foundation under ISF Grant No. 998-12.

Appendix: a tight lower-complexity bound

Appendix: a tight lower-complexity bound

In this appendix, we refine the proof from [22, Sect. 3.2] to obtain a new lower-complexity bound on the class of nonsmooth, convex, and Lipschitz-continuous functions, which together with the results discussed above form a tight complexity result for this class of problems. More precisely, under the setting of Sect. 2.1, we show that for any first-order method, the worst-case absolute inaccuracy after N steps cannot be better than \(\frac{LR}{\sqrt{N}}\), which is exactly the bound attained by Algorithm KLM.

In order to simplify the presentation, and following [22, Sect. 3.2], we restrict our attention to first-order methods that generate sequences that satisfy the following assumption:

Assumption 1

The sequence \(\{x_i\}\) satisfies

$$\begin{aligned} x_i \in x_1 + \mathrm {span}\{f'(x_1),\dots ,f'(x_{i-1})\}, \end{aligned}$$

where \(f'(x_i)\in \partial f(x_i)\) is obtained by evaluating a first-order oracle at \(x_i\).

As noted by Nesterov [22, Page 59], this assumption is not necessary and can be avoided by some additional reasoning.

The lower-complexity result is stated as follows.

Theorem 2

For any \(L,R>0\), \(N,p\in \mathbb {N}\) with \(N\le p\), and any starting point \(x_1\in \mathbb {R}^p\), there exists a convex and Lipschitz-continuous function \(f:\mathbb {R}^p\rightarrow \mathbb {R}\) with Lipschitz constant L and \(\Vert x^*_f-x_1\Vert \le R\), and a first-order oracle \(\mathcal {O}(x)= (f(x), f'(x))\), such that

$$\begin{aligned} f(x_N)-f^*\ge \frac{LR}{\sqrt{N}} \end{aligned}$$

for all sequences \(x_1,\dots ,x_N\) that satisfies Assumption 1.

Proof

The proof proceeds by constructing a “worst-case” function, on which any first-order method that satisfies Assumption 1 will not be able to improve its initial objective value during the first N iterations.

Let \(f_N:\mathbb {R}^p\rightarrow \mathbb {R}\) and \(\bar{f}_N:\mathbb {R}^p\rightarrow \mathbb {R}\) be defined by

$$\begin{aligned}&f_N(x) = \max _{1\le i \le N} \langle x, e_i\rangle , \\&\bar{f}_N(x) =L\max (f_N(x), \Vert x\Vert -R(1+N^{-1/2})), \end{aligned}$$

then it is easy to verify that \(\bar{f}_N\) is Lipschitz-continuous with constant L and that

$$\begin{aligned} \bar{f}_N^*= -\frac{LR}{\sqrt{N}} \end{aligned}$$

is attained for \(x^*\in \mathbb {R}^p\) such that

$$\begin{aligned} x^*= -\frac{R}{\sqrt{N}}\sum _{i=1}^N e_i. \end{aligned}$$

We equip \(\bar{f}_N\) with the oracle \(\mathcal {O}_N(x)= (\bar{f}_N(x), \bar{f}'_N(x))\) by choosing \(\bar{f}'_N(x)\in \partial \bar{f}_N(x)\) according to:

$$\begin{aligned} \bar{f}'_N(x) = {\left\{ \begin{array}{ll} L f'_N(x), &{} f_N(x)\ge \Vert x\Vert -R\left( 1+N^{-1/2}\right) ,\\ L\frac{x}{\Vert x\Vert }, &{} f_N(x)< \Vert x\Vert -R\left( 1+N^{-1/2}\right) , \end{array}\right. } \end{aligned}$$
(8.1)

where

$$\begin{aligned} f'_N(x) = e_{i^*}, \quad i^*= \min \{ i : f_N(x)=\langle x, e_i\rangle \}. \end{aligned}$$
(8.2)

We also denote

$$\begin{aligned} \mathbb {R}^{i,p} := \{x\in \mathbb {R}^d : \langle x, e_j\rangle =0,\ i+1\le j\le p\}. \end{aligned}$$

Now, let \(x_1,\dots ,x_N\) be a sequence that satisfies Assumption 1 with \(f=\bar{f}_N\) and the oracle \(\mathcal {O}_N\), where without loss of generality we assume \(x_1=0\). Then \(\bar{f}'_N(x_1) = e_1\) and we get \(x_2\in \mathrm {span}\{\bar{f}'_N(x_1)\}=\mathbb {R}^{1,p}\). Now, from \(\langle x_2,e_2\rangle =\dots =\langle x_2,e_N\rangle =0\), we get that \(\min \{ i : f_N(x)=\langle x, e_i\rangle \}\le 2\) and it follows by (8.1) and (8.2) that \(f'_N(x_2)\in \mathbb {R}^{2,p}\) and \(\bar{f}'_N(x_2)\in \mathbb {R}^{2,p}\). Hence, we conclude from Assumption 1 that \(x_3 \in \mathrm {span}\{\bar{f}'_N(x_1),\bar{f}'_N(x_2)\}\subseteq \mathbb {R}^{2,p}\). It is straightforward to continue this argument to show that \(x_i \in \mathbb {R}^{i-1,p}\) and \(\bar{f}'_N(x_i)\in \mathbb {R}^{i,p}\) for \(i=1,\dots ,N\), thus \(x_N \in \mathbb {R}^{N-1,p}\). Finally, since for every \(x\in \mathbb {R}^{N-1,p}\) we have \(\bar{f}_N(x)\ge \langle x,e_N\rangle =0\), we immediately get

$$\begin{aligned} \bar{f}_N(x_{N})-\bar{f}_N^*\ge \frac{LR}{\sqrt{N}}, \end{aligned}$$

which completes the proof. \(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Drori, Y., Teboulle, M. An optimal variant of Kelley’s cutting-plane method. Math. Program. 160, 321–351 (2016). https://doi.org/10.1007/s10107-016-0985-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10107-016-0985-7

Keywords

Mathematics Subject Classification

Navigation