Skip to main content
Log in

New results on subgradient methods for strongly convex optimization problems with a unified analysis

  • Published:
Computational Optimization and Applications Aims and scope Submit manuscript

Abstract

We develop subgradient- and gradient-based methods for minimizing strongly convex functions under a notion which generalizes the standard Euclidean strong convexity. We propose a unifying framework for subgradient methods which yields two kinds of methods, namely, the proximal gradient method (PGM) and the conditional gradient method (CGM), unifying several existing methods. The unifying framework provides tools to analyze the convergence of PGMs and CGMs for non-smooth, (weakly) smooth, and further for structured problems such as the inexact oracle models. The proposed subgradient methods yield optimal PGMs for several classes of problems and yield optimal and nearly optimal CGMs for smooth and weakly smooth problems, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Notes

  1. Notice that the function \(\varphi (x):=\psi (x)-\tau d(x)\) satisfies \(\varphi '(y;x-y)=\psi '(y;x-y)-\tau \left\langle \nabla {d}(y),x-y\right\rangle \), \(\forall x,y \in Q\). Hence, the convexity of \(\varphi (x)\) on Q implies \(\varphi (x)\ge \varphi (y)+\varphi '(y;x-y),\forall x,y \in Q\), which is equivalent to (2). Conversely, since \(\psi '(y;x-y)\ge -\psi '(y;y-x)\) holds and so is true for \(\varphi (\cdot )\) for \(x,y \in Q\), (2) implies the two inequalities \(\varphi (y)\ge \varphi (z)+\varphi '(z;y-z)\) and \(\varphi (x)\ge \varphi (z)-\varphi '(z;z-x)\) for \(x,y,z \in Q\). Since \(\varphi '(y;\cdot )\) is positively homogeneous, the convexity of \(\varphi (\cdot )\) on Q follows by taking a convex combination of the two with \(z=\alpha x + (1-\alpha )y,\alpha \in [0,1],x,y \in Q\).

  2. In fact, since they have the convergence rate \(f(\hat{x}_k)-f(x^*)\le \frac{cL\Vert x_0-x^* \Vert _2^2}{2k^2}\) for a constant \(c>0\), after \(k\ge \sqrt{2cL/\sigma _f}\) iterations, we have \(f(\hat{x}_k)-f(x^*)\le \frac{\sigma _f}{4}\Vert x_0-x^* \Vert _2^2\le \frac{1}{2}(f(x_0)-f(x^*))\) by the strong convexity of f and the optimality of \(x^*\). Then repeating \(O(\log _2(1/\varepsilon ))\) times of restarting the method every \(\sqrt{2cL/\sigma _f}\) iterations, it ensures an \(\varepsilon \)-solution.

  3. The auxiliary function \(\varphi _k(x)\) can possibly be an affine function. In that case, we will assume the boundedness of Q in order to ensure an existence of a minimizer \(z_k\).

  4. The proof of [16, Theorem 5.3] replacing the notation \((h(\cdot ),\lambda _{k+1},\tilde{\lambda }_{k+1},L_{k+1},\delta _{k+1},\bar{\alpha }_{k+1},\beta _{k+1},\alpha _k)\) of [16] by \((-f(\cdot ),x_k,z_k,L(x_k),\delta (x_k,x_{k+1}),\tau _k,S_k/\lambda _0,\lambda _k/\lambda _0)\) for \(k \ge 0\) shows the desired estimate because showing the result uses the assumption [16, eq.(52)] with \((L,\delta )=(L_{k+1},\delta _{k+1})\) only at \((\lambda ,\bar{\lambda })=(\lambda _{k+2},\lambda _{k+1})\), which corresponds to our assumption (6) at \((x,y)=(x_k,x_{k+1})\).

  5. As is indicated in [31], an obvious upper bound of \(d(x^*)\) can be obtained if \(\nabla {f}(x^*)=0\) and we know M for the weakly smooth problems [example (iv) in Sect. 2.3.1] in the Euclidean setting \(d(x)=\frac{1}{2}\Vert x-x_0 \Vert _2^2\) : The inequality \(d(x^*) \le \frac{1}{2}(\frac{2M}{\rho \sigma _f})^{2/(2-\rho )}\) follows since we have \(\frac{\sigma _f}{2}\Vert x^*-x_0 \Vert _2^2 \le f(x_0)-f(x^*) \le \frac{M}{\rho }\Vert x_0-x^* \Vert _2^\rho \) [recall the strong convexity and (6)].

References

  1. Argyriou, A., Signoretto, M., Suykens, J.: Hybrid conditional gradient - smoothing algorithms with applications to sparse and low rank regularization. In: Suykens, J., Argyriou, A., Signoretto, M. (eds.) Regularization, Optimization, Kernels, and Support Vector Machines, pp. 53–82. Chapman & Hall/CRC, Boca Raton (2014)

    Google Scholar 

  2. Auslender, A., Teboulle, M.: Interior gradient and proximal method for convex and conic optimization. SIAM J. Optim. 16, 697–725 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  3. Bach, F.: Duality between subgradient and conditional gradient methods. SIAM J. Optim. 25, 115–129 (2015)

    Article  MathSciNet  Google Scholar 

  4. Beck, A., Teboulle, M.: Mirror descent and nonlinear projected subgradient methods for convex optimization. Oper. Res. Lett. 31, 167–175 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  5. Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2, 183–202 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  6. Beck, A., Teboulle, M.: Smoothing and first order methods: a unified framework. SIAM J. Optim. 22, 557–580 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  7. Bregman, L.: The relaxation method of finding the common points of convex sets and its application to the solution of problems in convex programming. USSR Comput. Math. Math. Phys. 7, 200–217 (1967)

    Article  MathSciNet  MATH  Google Scholar 

  8. Chen, X., Lin, Q., Peña, J.: Optimal regularized dual averaging methods for stochastic optimization. Adv. Neural Inf. Process. Syst. 25, 395–403 (2012)

    Google Scholar 

  9. Cox, B., Juditsky, B., Nemirovski, A.: Dual subgradient algorithms for large-scale nonsmooth learning problems. Math. Program. 148, 143–180 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  10. Demyanov, V.F., Rubinov, A.M.: Approximate Methods in Optimization Problems. American Elsevier Publishing Company, New York (1970)

    Google Scholar 

  11. Devolder, O., Glineur, F., Nesterov, Y.: First-order methods with inexact oracle: the strongly convex case. CORE discussion paper, 2013/16 (2013)

  12. Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Math. Program. 146, 37–75 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  13. Dunn, J., Harshbarger, S.: Conditional gradient algorithms with open loop step size rules. J. Math. Anal. Appl. 62, 432–444 (1978)

    Article  MathSciNet  MATH  Google Scholar 

  14. Elster, K.-H. (ed.): Modern Mathematical Methods in Optimization. Akademie Verlag, Berlin (1993)

    MATH  Google Scholar 

  15. Frank, M., Wolfe, P.: An algorithm for quadratic programming. Nav. Res. Logist. Q. 3, 95–110 (1956)

    Article  MathSciNet  Google Scholar 

  16. Freund, R.M., Grigas, P.: New analysis and results for the Frank–Wolfe method. Math. Program. 155, 199–230 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  17. Fukushima, M., Mine, H.: A generalized proximal point algorithm for certain non-convex minimization problems. Int. J. Syst. Sci. 12, 989–1000 (1981)

    Article  MathSciNet  MATH  Google Scholar 

  18. Ghadimi, S., Lan, G.: Optimal stochastic approximation algorithms for strongly convex stochastic composite optimization, I: A generic algorithmic framework. SIAM J. Optim. 22, 1469–1492 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  19. Ghadimi, S., Lan, G.: Optimal stochastic approximation algorithms for strongly convex stochastic composite optimization, II: Shrinking procedures and optimal algorithms. SIAM J. Optim. 23, 2061–2089 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  20. Guzmán, C., Nemirovski, A.: On lower complexity bounds for large-scale convex optimization. J. Complex. 31, 1–14 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  21. Harchaoui, Z., Juditsky, A., Nemirovski, A.: Conditional gradient algorithms for norm-regularized smooth convex optimization. Math. Program. 152, 75–112 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  22. Ito, M., Fukuda, M.: A family of subgradient-based methods for convex optimization problems in a unifying framework. Optim. Meth. Software (to appear)

  23. Jaggi, M.: Sparse convex optimization methods for machine learning, Ph.D. thesis, ETH Zurich (2011)

  24. Jaggi, M.: Revisiting Frank-Wolfe: Projection-free sparse convex optimization. In: Proceedings of the 30th international conference on machine learning, pp. 427–435 (2013)

  25. Juditsky, A., Nesterov, Y.: Deterministic and stochastic primal-dual subgradient algorithms for uniformly convex minimization. Stoch. Syst. 4, 44–80 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  26. Lan, G.: An optimal method for stochastic composite optimization. Math. Program. 133, 365–397 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  27. Lan, G.: The complexity of large-scale convex programming under a linear optimization oracle. arXiv:1309.5550v2 (2014)

  28. Lan, G.: Gradient sliding for composite optimization. Math. Program. (to appear)

  29. Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. In: Uryasev, S., Pardalos, P. (eds.) Stoch. Optim., pp. 223–264. Kluwer Academic Publishers, Dordrecht (2001)

    Google Scholar 

  30. Nedić, A., Lee, S.: On stochastic subgradient mirror-descent algorithm with weighted averaging. SIAM J. Optim. 24, 84–107 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  31. Nemirovski, A., Nesterov, Y.: Optimal methods for smooth convex minimization, Zh. Vychishl. Mat. i Mat. Fiz., 25, 356–369 (1985) (in Russian); English translation: USSR Computational Mathematics and Mathematical Physics, 24, 80–82 (1984)

  32. Nemirovski, A., Yudin, D.: Problem Complexity and Method Efficiency in Optimization, Nauka Publishers, Moscow, Russia (1979) (in Russian). English translation: Wiley, New York (1983)

  33. Nesterov, Y.: A method of solving a convex programming problem with convergence rate \(O(1/k^2)\). Sov. Math. Dokl. 27, 372–376 (1983)

    MathSciNet  MATH  Google Scholar 

  34. Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course. Kluwer Academic Publishers, Boston (2004)

    Book  MATH  Google Scholar 

  35. Nesterov, Y.: Smooth minimization of non-smooth functions. Math. Program. 103, 127–152 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  36. Nesterov, Y.: Excessive gap technique in nonsmooth convex minimization. SIAM J. Optim. 16, 235–249 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  37. Nesterov, Y.: Primal-dual subgradient methods for convex problems. Math. Program. 120, 221–259 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  38. Nesterov, Y.: Gradient methods for minimizing composite functions. Math. Program. 140, 125–161 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  39. Nesterov, Y.: Universal gradient methods for convex optimization problems. Math. Program. 152, 381–404 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  40. Nesterov, Y.: Complexity bounds for primal-dual methods minimizing the model of objective function. CORE discussion paper, 2015/3 (2015)

  41. Pshenichny, B.N., Danilin, Y.M.: Numerical Methods in Extremal Problems. MIR Publishers, Moscow (1978)

    Google Scholar 

  42. Tseng, P.: On accelerated proximal gradient methods for convex-concave optimization, Technical Report, University of Washington (2008)

  43. Tseng, P.: Approximation accuracy, gradient methods, and error bound for structured convex optimization. Math. Program. 125, 263–295 (2010)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgments

The author is very thankful to the anonymous referees who gave constructive suggestions which improved substantially the readability of the paper. He is also thankful to Prof. Mituhiro Fukuda for comments and suggestions and also to Prof. Guanghui Lan for pointing out some related results. This work was partially supported by JSPS Grant-in-Aid for Scientific Research (C) Number 26330024.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Masaru Ito.

Appendix

Appendix

In order to complete the proof of Theorem 5.3, we need to obtain upper bounds for \(1/S_k\) and \(\sum _{i=0}^kS_i/S_k\) for the sequence \(\{S_k\}_{k \ge 0}\) defined by (46). Since \(\lambda _{k+1}=S_{k+1}-S_k\), writing \(r:=\frac{\sigma _f\sigma _d}{L-\bar{\sigma }_f\sigma _d}\ge 0\), the sequence \(\{S_k\}_{k\ge 0}\) in (46) is determined by the recurrence

$$\begin{aligned} S_0=1,\quad (S_{k+1}-S_k)^2=S_{k+1}(1+rS_k),\quad k \ge 0 \end{aligned}$$
(52)

where the root of the equation in \(S_{k+1}\) takes the largest one, namely,

$$\begin{aligned} S_{k+1}=\frac{1+(2+r)S_k+\sqrt{(1+(2+r)S_k)^2-4S_k^2}}{2}. \end{aligned}$$
(53)

The essentials of lemmas below are the same as [11, Lemma 4–7] excepting the replacement of \(\mu /L\) in the article by an arbitrary \(r\ge 0\).

Lemma 7.1

For any sequence \(\{S_k\}_{k \ge 0}\) defined by (52) for \(r \ge 0\), we have

$$\begin{aligned} \frac{1}{S_k} \le \min \left\{ \frac{4}{(k+1)(k+4)}, \left( \frac{2}{2+r+\sqrt{r^2+4r}}\right) ^{k}\right\} ,\quad \forall k \ge 0. \end{aligned}$$

Proof

Since \(S_{k+1}\ge S_k\), we have

$$\begin{aligned} \sqrt{S_{k+1}}-\sqrt{S_{k}} = \frac{S_{k+1}-S_{k}}{\sqrt{S_{k+1}}+\sqrt{S_{k}}} \ge \frac{S_{k+1}-S_{k}}{2\sqrt{S_{k+1}}} \mathop {=}\limits ^{(52)} \frac{1}{2}\sqrt{1+rS_k}\ge \frac{1}{2} \end{aligned}$$
(54)

which shows \(\sqrt{S_k}\ge \frac{k}{2}+\sqrt{S_0}=\frac{k+2}{2}\) for all \(k \ge 0\). Then, we have

$$\begin{aligned} S_{k}-S_0= & {} \sum _{i=0}^{k-1}(S_{i+1}-S_i)\mathop {=}\limits ^{(52)}\sum _{i=0}^{k-1}\sqrt{S_{i+1}(1+rS_i)}\ge \sum _{i=0}^{k-1}\sqrt{S_{i+1}}\\\ge & {} \sum _{i=0}^{k-1}\frac{i+3}{2}=\frac{k(k+5)}{4} \end{aligned}$$

which gives \(S_k \ge S_0+\frac{k(k+5)}{4}=\frac{(k+1)(k+4)}{4}\). On the other hand, using (53) yields that

$$\begin{aligned} \frac{S_{k+1}}{S_k}= & {} \frac{\frac{1}{S_k}+2+r+\sqrt{\left( \frac{1}{S_k}+(2+r)\right) ^2-4}}{2} \ge \frac{2+r+\sqrt{(2+r)^2-4}}{2} \nonumber \\= & {} \frac{2+r+\sqrt{r^2+4r}}{2} \end{aligned}$$
(55)

for all \(k \ge 0\). Hence, we have \(S_k\ge S_0\left( \frac{2+r+\sqrt{r^2+4r}}{2}\right) ^k=\left( \frac{2+r+\sqrt{r^2+4r}}{2}\right) ^k\). \(\square \)

Remark 7.2

The linear convergence factor \(\frac{2}{2+r+\sqrt{r^2+4r}}\) in the above lemma satisfies

$$\begin{aligned} 1-\sqrt{\frac{r}{r+1}} \le \frac{2}{2+r+\sqrt{r^2+4r}} \le \left( 1+\frac{1}{2}\sqrt{r}\right) ^{-2}. \end{aligned}$$

In fact, since

$$\begin{aligned} \left( 1-\sqrt{\frac{r}{r+1}}\right) ^{-1}= & {} \frac{\sqrt{r+1}}{\sqrt{r+1}-\sqrt{r}}=\sqrt{r+1}(\sqrt{r+1}+\sqrt{r})\\= & {} \frac{2+2r+\sqrt{4r^2+4r}}{2}, \end{aligned}$$

we obtain

$$\begin{aligned} \left( 1+\frac{1}{2}\sqrt{r}\right) ^{2}= & {} \frac{2+r/2+\sqrt{4r}}{2} \le \frac{2+r+\sqrt{r^2+4r}}{2} \\\le & {} \frac{2+2r+\sqrt{4r^2+4r}}{2}=\left( 1-\sqrt{\frac{r}{r+1}}\right) ^{-1}. \end{aligned}$$

Note that if \(\bar{\sigma }_f=\sigma _f\) and \(r=\frac{\sigma _f\sigma _d}{L-\bar{\sigma }_f\sigma _d}\), then \(\sqrt{\frac{r}{r+1}}=\sqrt{\frac{\sigma _f\sigma _d}{L}}\).

Lemma 7.3

The sequence \(\{S_k\}_{k \ge 0}\) defined by (52) for \(r>0\) satisfies

$$\begin{aligned} \frac{\sum _{i=0}^kS_i}{S_k} \le \frac{1+\sqrt{1+4r^{-1}}}{2}\le 1+\sqrt{\frac{1}{r}},\quad \forall k \ge 0. \end{aligned}$$

Proof

Notice that \(\gamma := \frac{1+\sqrt{1+4r^{-1}}}{2}\) satisfies

$$\begin{aligned} \left( 1-\frac{1}{\gamma }\right) ^{-1}=\frac{\gamma }{\gamma -1}=\frac{\sqrt{1+4r^{-1}}+1}{\sqrt{1+4r^{-1}}-1}=\frac{(\sqrt{1+4r^{-1}}+1)^2}{4r^{-1}}=\frac{2+r+\sqrt{r^2+4r}}{2}. \end{aligned}$$

Therefore, we obtain \(\frac{S_k}{S_{k+1}}\le 1-\frac{1}{\gamma }\) by (55). Now the result follows by induction: if \(\sum _{i=0}^kS_i/S_k\le \gamma \) holds for some \(k \ge 0\), we have

$$\begin{aligned} \frac{\sum _{i=0}^{k+1}S_i}{S_{k+1}} = 1+\frac{S_k}{S_{k+1}}\frac{\sum _{i=0}^kS_i}{S_k} \le 1+\frac{\gamma -1}{\gamma }\cdot \gamma =\gamma . \end{aligned}$$

This proves the first inequality; the second can be verified from \(\sqrt{1+4r^{-1}}\le 1+2\sqrt{r^{-1}}\). \(\square \)

Note that the result of Lemma 7.3 is the same as [11, Lemma 5] because \(1+\frac{2\sqrt{r^{-1}}}{\sqrt{r}+\sqrt{r+4}}=\frac{1+\sqrt{1+4r^{-1}}}{2}\).

Lemma 7.4

Let \(\{S_k\}_{k \ge 0}\) be defined as Lemma 7.3 and \(\{T_k\}_{k \ge 0}\) be defined by (52) with \(r:=0\), namely \(T_0:=1\) and \(T_{k+1}:=\frac{1+2T_k+\sqrt{1+4T_k}}{2}\) for \(k \ge 0\). Then, we have

$$\begin{aligned} \frac{\sum _{i=0}^kS_i}{S_k} \le \frac{\sum _{i=0}^kT_i}{T_k},\quad \forall k \ge 0. \end{aligned}$$

Proof

Due to the identity

$$\begin{aligned} \frac{\sum _{i=0}^kS_i}{S_k} = 1+\sum _{i=0}^{k-1}\frac{S_i}{S_{k}} = 1+\sum _{i=0}^{k-1}\prod _{j=i}^{k-1}\frac{S_j}{S_{j+1}}, \quad k \ge 0, \end{aligned}$$

it is enough to show that \(\frac{S_k}{S_{k+1}} \le \frac{T_k}{T_{k+1}}\) for every \(k \ge 0\). Notice that we have

$$\begin{aligned} \frac{S_{k+1}}{S_k} = \frac{\frac{1+rS_k}{S_k}+2+\sqrt{\left( \frac{1+rS_k}{S_k}+2\right) ^2-4}}{2},\quad \frac{T_{k+1}}{T_k} = \frac{\frac{1}{T_k}+2+\sqrt{\left( \frac{1}{T_k}+2\right) ^2-4}}{2}, \end{aligned}$$
(56)

which suggests us to prove \(\frac{1+rS_k}{S_k} \ge \frac{1}{T_k}\) for \(k \ge 0\). It is true for \(k=0\) by \(S_0=T_0\). If it holds for \(k \ge 0\), then, writing \(\alpha :=\frac{1+rS_k}{S_k} \ge \beta := \frac{1}{T_k}\), we obtain

$$\begin{aligned} \frac{1+rS_{k+1}}{S_{k+1}}\ge & {} \frac{1+rS_k}{S_{k+1}} = \frac{S_k}{S_{k+1}}\alpha \mathop {=}\limits ^{(56)} \frac{2\alpha }{\alpha +2+\sqrt{(\alpha +2)^2-4}}\\\ge & {} \frac{2\beta }{\beta +2+\sqrt{(\beta +2)^2-4}} \mathop {=}\limits ^{(56)} \frac{T_k}{T_{k+1}}\beta =\frac{1}{T_{k+1}} \end{aligned}$$

since \(S_{k+1} \ge S_k\) and \(x \mapsto \frac{2x}{x+2+\sqrt{(x+2)^2-4}}=\frac{2}{1+2x^{-1}+\sqrt{1+4x^{-1}}}\) is non-decreasing on \((0,\infty )\). Hence, we claim \(\frac{1+rS_k}{S_k} \ge \frac{1}{T_k}\) for all \(k \ge 0\) and therefore the proof is completed. \(\square \)

Lemma 7.5

Let \(\{T_k\}_{k \ge 0}\) be a sequence defined by (52) with \(r:=0\), namely \(T_0:=1\) and \(T_{k+1}:=\frac{1+2T_k+\sqrt{1+4T_k}}{2}\) for \(k \ge 0\). Then, we have

$$\begin{aligned} \frac{\sum _{i=0}^kT_i}{T_k} \le \frac{1}{3}k +\frac{1}{6}\log (k+2)+1,\quad \forall k \ge 0. \end{aligned}$$

Proof

The case \(k =0\) is obvious. Assume that the assertion is true for some \(k \ge 0\). Putting \(U_k:=\frac{1}{3}k +\frac{1}{6}\log (k+2)+1\), we have

$$\begin{aligned} \frac{\sum _{i=0}^{k+1}T_i}{T_{k+1}} = 1+\frac{T_k}{T_{k+1}}\frac{\sum _{i=0}^kT_i}{T_k} \le 1+\frac{T_k}{T_{k+1}}U_k. \end{aligned}$$

Hence, it remains to show \(1+\frac{T_k}{T_{k+1}}U_k \le U_{k+1}\) for \(k \ge 0\). For that, we analyze the sequence \(t_0:=1,~t_{k+1}:=T_{k+1}-T_k\) for \(k \ge 0\) (namely, \(T_k=\sum _{i=0}^k t_i\)). The recurrence relation of \(T_k\) implies \(t_{k}^2=(T_{k}-T_{k-1})^2=T_{k}\) and

$$\begin{aligned} t_{k+1}=T_{k+1}-T_k\mathop {=}\limits ^{(53)} \frac{1+\sqrt{1+4T_k}}{2} = \frac{1+\sqrt{1+4t_k^2}}{2},\quad \forall k \ge 0. \end{aligned}$$

Analyzing the difference \(t_{k+1}-t_k\) shows for \(k \ge 0\) that

$$\begin{aligned} t_{k+1}-t_k= & {} \frac{1+\sqrt{1+4t_k^2}-2t_k}{2}=\frac{1}{2}+\frac{1}{2\left( \sqrt{1+4t_k^2} +2t_k\right) } \\\le & {} \frac{1}{2}+\frac{1}{2\left( \sqrt{4t_k^2}+2t_k\right) } = \frac{1}{2}+\frac{1}{8t_k}. \end{aligned}$$

Since Lemma 7.1 yields \(t_k=\sqrt{T_k}\ge \sqrt{(k+1)(k+4)/4}\ge (k+2)/2\) for \(k \ge 0\), we obtain

$$\begin{aligned} t_{k+1}\le & {} t_0+\frac{k+1}{2}+\frac{1}{8}\sum _{i=0}^{k}\frac{1}{t_i}\le \frac{k}{2}+\frac{3}{2}+\frac{1}{8}\sum _{i=0}^{k}\frac{2}{i+2}\\\le & {} \frac{k}{2}+\frac{3}{2}+\frac{1}{4}\log (k+2)=\frac{3}{2}U_{k} \end{aligned}$$

for all \(k \ge 0\). Finally, this upper bound of \(t_k\) concludes that

$$\begin{aligned} \frac{U_k}{1+U_k-U_{k+1}}= & {} \frac{3U_k}{2+\frac{1}{2}\log \frac{k+2}{k+3}}\ge \frac{3}{2}U_k \\\ge & {} t_{k+1}=\frac{t_{k+1}^2}{t_{k+1}}=\frac{T_{k+1}}{T_{k+1}-T_k}. \end{aligned}$$

Taking the inverse and multiplying by \(U_k\) for both sides yield \(1+\frac{T_k}{T_{k+1}}U_k \le U_{k+1}\).   \(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ito, M. New results on subgradient methods for strongly convex optimization problems with a unified analysis. Comput Optim Appl 65, 127–172 (2016). https://doi.org/10.1007/s10589-016-9841-1

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10589-016-9841-1

Keywords

Mathematics Subject Classification

Navigation