New results on subgradient methods for strongly convex optimization problems with a unified analysis

Ito, Masaru

doi:10.1007/s10589-016-9841-1

New results on subgradient methods for strongly convex optimization problems with a unified analysis

Published: 26 March 2016

Volume 65, pages 127–172, (2016)
Cite this article

Computational Optimization and Applications Aims and scope Submit manuscript

Masaru Ito¹

596 Accesses
1 Citation
1 Altmetric
Explore all metrics

Abstract

We develop subgradient- and gradient-based methods for minimizing strongly convex functions under a notion which generalizes the standard Euclidean strong convexity. We propose a unifying framework for subgradient methods which yields two kinds of methods, namely, the proximal gradient method (PGM) and the conditional gradient method (CGM), unifying several existing methods. The unifying framework provides tools to analyze the convergence of PGMs and CGMs for non-smooth, (weakly) smooth, and further for structured problems such as the inexact oracle models. The proposed subgradient methods yield optimal PGMs for several classes of problems and yield optimal and nearly optimal CGMs for smooth and weakly smooth problems, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The Frank-Wolfe Algorithm: A Short Introduction

Article Open access 13 December 2023

$\mathbf{C^{2}}$ -Lusin approximation of strongly convex functions

Article 03 April 2024

Preconditioned golden ratio primal-dual algorithm with linesearch

Article 16 April 2024

Notes

Notice that the function $\varphi (x):=\psi (x)-\tau d(x)$ satisfies $\varphi '(y;x-y)=\psi '(y;x-y)-\tau \left\langle \nabla {d}(y),x-y\right\rangle $, $\forall x,y \in Q$. Hence, the convexity of $\varphi (x)$ on Q implies $\varphi (x)\ge \varphi (y)+\varphi '(y;x-y),\forall x,y \in Q$, which is equivalent to (2). Conversely, since $\psi '(y;x-y)\ge -\psi '(y;y-x)$ holds and so is true for $\varphi (\cdot )$ for $x,y \in Q$, (2) implies the two inequalities $\varphi (y)\ge \varphi (z)+\varphi '(z;y-z)$ and $\varphi (x)\ge \varphi (z)-\varphi '(z;z-x)$ for $x,y,z \in Q$. Since $\varphi '(y;\cdot )$ is positively homogeneous, the convexity of $\varphi (\cdot )$ on Q follows by taking a convex combination of the two with $z=\alpha x + (1-\alpha )y,\alpha \in [0,1],x,y \in Q$.
In fact, since they have the convergence rate $f(\hat{x}_k)-f(x^*)\le \frac{cL\Vert x_0-x^* \Vert _2^2}{2k^2}$ for a constant $c>0$, after $k\ge \sqrt{2cL/\sigma _f}$ iterations, we have $f(\hat{x}_k)-f(x^*)\le \frac{\sigma _f}{4}\Vert x_0-x^* \Vert _2^2\le \frac{1}{2}(f(x_0)-f(x^*))$ by the strong convexity of f and the optimality of $x^*$. Then repeating $O(\log _2(1/\varepsilon ))$ times of restarting the method every $\sqrt{2cL/\sigma _f}$ iterations, it ensures an $\varepsilon $-solution.
The auxiliary function $\varphi _k(x)$ can possibly be an affine function. In that case, we will assume the boundedness of Q in order to ensure an existence of a minimizer $z_k$.
The proof of [16, Theorem 5.3] replacing the notation $(h(\cdot ),\lambda _{k+1},\tilde{\lambda }_{k+1},L_{k+1},\delta _{k+1},\bar{\alpha }_{k+1},\beta _{k+1},\alpha _k)$ of [16] by $(-f(\cdot ),x_k,z_k,L(x_k),\delta (x_k,x_{k+1}),\tau _k,S_k/\lambda _0,\lambda _k/\lambda _0)$ for $k \ge 0$ shows the desired estimate because showing the result uses the assumption [16, eq.(52)] with $(L,\delta )=(L_{k+1},\delta _{k+1})$ only at $(\lambda ,\bar{\lambda })=(\lambda _{k+2},\lambda _{k+1})$, which corresponds to our assumption (6) at $(x,y)=(x_k,x_{k+1})$.
As is indicated in [31], an obvious upper bound of $d(x^*)$ can be obtained if $\nabla {f}(x^*)=0$ and we know M for the weakly smooth problems [example (iv) in Sect. 2.3.1] in the Euclidean setting $d(x)=\frac{1}{2}\Vert x-x_0 \Vert _2^2$ : The inequality $d(x^*) \le \frac{1}{2}(\frac{2M}{\rho \sigma _f})^{2/(2-\rho )}$ follows since we have $\frac{\sigma _f}{2}\Vert x^*-x_0 \Vert _2^2 \le f(x_0)-f(x^*) \le \frac{M}{\rho }\Vert x_0-x^* \Vert _2^\rho $ [recall the strong convexity and (6)].

References

Argyriou, A., Signoretto, M., Suykens, J.: Hybrid conditional gradient - smoothing algorithms with applications to sparse and low rank regularization. In: Suykens, J., Argyriou, A., Signoretto, M. (eds.) Regularization, Optimization, Kernels, and Support Vector Machines, pp. 53–82. Chapman & Hall/CRC, Boca Raton (2014)
Google Scholar
Auslender, A., Teboulle, M.: Interior gradient and proximal method for convex and conic optimization. SIAM J. Optim. 16, 697–725 (2006)
Article MathSciNet MATH Google Scholar
Bach, F.: Duality between subgradient and conditional gradient methods. SIAM J. Optim. 25, 115–129 (2015)
Article MathSciNet Google Scholar
Beck, A., Teboulle, M.: Mirror descent and nonlinear projected subgradient methods for convex optimization. Oper. Res. Lett. 31, 167–175 (2003)
Article MathSciNet MATH Google Scholar
Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2, 183–202 (2009)
Article MathSciNet MATH Google Scholar
Beck, A., Teboulle, M.: Smoothing and first order methods: a unified framework. SIAM J. Optim. 22, 557–580 (2012)
Article MathSciNet MATH Google Scholar
Bregman, L.: The relaxation method of finding the common points of convex sets and its application to the solution of problems in convex programming. USSR Comput. Math. Math. Phys. 7, 200–217 (1967)
Article MathSciNet MATH Google Scholar
Chen, X., Lin, Q., Peña, J.: Optimal regularized dual averaging methods for stochastic optimization. Adv. Neural Inf. Process. Syst. 25, 395–403 (2012)
Google Scholar
Cox, B., Juditsky, B., Nemirovski, A.: Dual subgradient algorithms for large-scale nonsmooth learning problems. Math. Program. 148, 143–180 (2013)
Article MathSciNet MATH Google Scholar
Demyanov, V.F., Rubinov, A.M.: Approximate Methods in Optimization Problems. American Elsevier Publishing Company, New York (1970)
Google Scholar
Devolder, O., Glineur, F., Nesterov, Y.: First-order methods with inexact oracle: the strongly convex case. CORE discussion paper, 2013/16 (2013)
Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Math. Program. 146, 37–75 (2014)
Article MathSciNet MATH Google Scholar
Dunn, J., Harshbarger, S.: Conditional gradient algorithms with open loop step size rules. J. Math. Anal. Appl. 62, 432–444 (1978)
Article MathSciNet MATH Google Scholar
Elster, K.-H. (ed.): Modern Mathematical Methods in Optimization. Akademie Verlag, Berlin (1993)
MATH Google Scholar
Frank, M., Wolfe, P.: An algorithm for quadratic programming. Nav. Res. Logist. Q. 3, 95–110 (1956)
Article MathSciNet Google Scholar
Freund, R.M., Grigas, P.: New analysis and results for the Frank–Wolfe method. Math. Program. 155, 199–230 (2014)
Article MathSciNet MATH Google Scholar
Fukushima, M., Mine, H.: A generalized proximal point algorithm for certain non-convex minimization problems. Int. J. Syst. Sci. 12, 989–1000 (1981)
Article MathSciNet MATH Google Scholar
Ghadimi, S., Lan, G.: Optimal stochastic approximation algorithms for strongly convex stochastic composite optimization, I: A generic algorithmic framework. SIAM J. Optim. 22, 1469–1492 (2012)
Article MathSciNet MATH Google Scholar
Ghadimi, S., Lan, G.: Optimal stochastic approximation algorithms for strongly convex stochastic composite optimization, II: Shrinking procedures and optimal algorithms. SIAM J. Optim. 23, 2061–2089 (2013)
Article MathSciNet MATH Google Scholar
Guzmán, C., Nemirovski, A.: On lower complexity bounds for large-scale convex optimization. J. Complex. 31, 1–14 (2015)
Article MathSciNet MATH Google Scholar
Harchaoui, Z., Juditsky, A., Nemirovski, A.: Conditional gradient algorithms for norm-regularized smooth convex optimization. Math. Program. 152, 75–112 (2015)
Article MathSciNet MATH Google Scholar
Ito, M., Fukuda, M.: A family of subgradient-based methods for convex optimization problems in a unifying framework. Optim. Meth. Software (to appear)
Jaggi, M.: Sparse convex optimization methods for machine learning, Ph.D. thesis, ETH Zurich (2011)
Jaggi, M.: Revisiting Frank-Wolfe: Projection-free sparse convex optimization. In: Proceedings of the 30th international conference on machine learning, pp. 427–435 (2013)
Juditsky, A., Nesterov, Y.: Deterministic and stochastic primal-dual subgradient algorithms for uniformly convex minimization. Stoch. Syst. 4, 44–80 (2014)
Article MathSciNet MATH Google Scholar
Lan, G.: An optimal method for stochastic composite optimization. Math. Program. 133, 365–397 (2012)
Article MathSciNet MATH Google Scholar
Lan, G.: The complexity of large-scale convex programming under a linear optimization oracle. arXiv:1309.5550v2 (2014)
Lan, G.: Gradient sliding for composite optimization. Math. Program. (to appear)
Nedić, A., Bertsekas, D.: Convergence rate of incremental subgradient algorithms. In: Uryasev, S., Pardalos, P. (eds.) Stoch. Optim., pp. 223–264. Kluwer Academic Publishers, Dordrecht (2001)
Google Scholar
Nedić, A., Lee, S.: On stochastic subgradient mirror-descent algorithm with weighted averaging. SIAM J. Optim. 24, 84–107 (2014)
Article MathSciNet MATH Google Scholar
Nemirovski, A., Nesterov, Y.: Optimal methods for smooth convex minimization, Zh. Vychishl. Mat. i Mat. Fiz., 25, 356–369 (1985) (in Russian); English translation: USSR Computational Mathematics and Mathematical Physics, 24, 80–82 (1984)
Nemirovski, A., Yudin, D.: Problem Complexity and Method Efficiency in Optimization, Nauka Publishers, Moscow, Russia (1979) (in Russian). English translation: Wiley, New York (1983)
Nesterov, Y.: A method of solving a convex programming problem with convergence rate $O(1/k^2)$. Sov. Math. Dokl. 27, 372–376 (1983)
MathSciNet MATH Google Scholar
Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course. Kluwer Academic Publishers, Boston (2004)
Book MATH Google Scholar
Nesterov, Y.: Smooth minimization of non-smooth functions. Math. Program. 103, 127–152 (2005)
Article MathSciNet MATH Google Scholar
Nesterov, Y.: Excessive gap technique in nonsmooth convex minimization. SIAM J. Optim. 16, 235–249 (2005)
Article MathSciNet MATH Google Scholar
Nesterov, Y.: Primal-dual subgradient methods for convex problems. Math. Program. 120, 221–259 (2009)
Article MathSciNet MATH Google Scholar
Nesterov, Y.: Gradient methods for minimizing composite functions. Math. Program. 140, 125–161 (2013)
Article MathSciNet MATH Google Scholar
Nesterov, Y.: Universal gradient methods for convex optimization problems. Math. Program. 152, 381–404 (2015)
Article MathSciNet MATH Google Scholar
Nesterov, Y.: Complexity bounds for primal-dual methods minimizing the model of objective function. CORE discussion paper, 2015/3 (2015)
Pshenichny, B.N., Danilin, Y.M.: Numerical Methods in Extremal Problems. MIR Publishers, Moscow (1978)
Google Scholar
Tseng, P.: On accelerated proximal gradient methods for convex-concave optimization, Technical Report, University of Washington (2008)
Tseng, P.: Approximation accuracy, gradient methods, and error bound for structured convex optimization. Math. Program. 125, 263–295 (2010)
Article MathSciNet MATH Google Scholar

Download references

Acknowledgments

The author is very thankful to the anonymous referees who gave constructive suggestions which improved substantially the readability of the paper. He is also thankful to Prof. Mituhiro Fukuda for comments and suggestions and also to Prof. Guanghui Lan for pointing out some related results. This work was partially supported by JSPS Grant-in-Aid for Scientific Research (C) Number 26330024.

Author information

Authors and Affiliations

Department of Mathematical and Computing Sciences, Tokyo Institute of Technology, 2-12-1-W8-41 Oh-okayama, Meguro, Tokyo, 152-8552, Japan
Masaru Ito

Authors

Masaru Ito
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Masaru Ito.

Appendix

In order to complete the proof of Theorem 5.3, we need to obtain upper bounds for $1/S_k$ and $\sum _{i=0}^kS_i/S_k$ for the sequence $\{S_k\}_{k \ge 0}$ defined by (46). Since $\lambda _{k+1}=S_{k+1}-S_k$, writing $r:=\frac{\sigma _f\sigma _d}{L-\bar{\sigma }_f\sigma _d}\ge 0$, the sequence $\{S_k\}_{k\ge 0}$ in (46) is determined by the recurrence

$$\begin{aligned} S_0=1,\quad (S_{k+1}-S_k)^2=S_{k+1}(1+rS_k),\quad k \ge 0 \end{aligned}$$

(52)

where the root of the equation in $S_{k+1}$ takes the largest one, namely,

$$\begin{aligned} S_{k+1}=\frac{1+(2+r)S_k+\sqrt{(1+(2+r)S_k)^2-4S_k^2}}{2}. \end{aligned}$$

(53)

The essentials of lemmas below are the same as [11, Lemma 4–7] excepting the replacement of $\mu /L$ in the article by an arbitrary $r\ge 0$.

Lemma 7.1

For any sequence $\{S_k\}_{k \ge 0}$ defined by (52) for $r \ge 0$, we have

$$\begin{aligned} \frac{1}{S_k} \le \min \left\{ \frac{4}{(k+1)(k+4)}, \left( \frac{2}{2+r+\sqrt{r^2+4r}}\right) ^{k}\right\} ,\quad \forall k \ge 0. \end{aligned}$$

Proof

Since $S_{k+1}\ge S_k$, we have

$$\begin{aligned} \sqrt{S_{k+1}}-\sqrt{S_{k}} = \frac{S_{k+1}-S_{k}}{\sqrt{S_{k+1}}+\sqrt{S_{k}}} \ge \frac{S_{k+1}-S_{k}}{2\sqrt{S_{k+1}}} \mathop {=}\limits ^{(52)} \frac{1}{2}\sqrt{1+rS_k}\ge \frac{1}{2} \end{aligned}$$

(54)

which shows $\sqrt{S_k}\ge \frac{k}{2}+\sqrt{S_0}=\frac{k+2}{2}$ for all $k \ge 0$. Then, we have

$$\begin{aligned} S_{k}-S_0= & {} \sum _{i=0}^{k-1}(S_{i+1}-S_i)\mathop {=}\limits ^{(52)}\sum _{i=0}^{k-1}\sqrt{S_{i+1}(1+rS_i)}\ge \sum _{i=0}^{k-1}\sqrt{S_{i+1}}\\\ge & {} \sum _{i=0}^{k-1}\frac{i+3}{2}=\frac{k(k+5)}{4} \end{aligned}$$

which gives $S_k \ge S_0+\frac{k(k+5)}{4}=\frac{(k+1)(k+4)}{4}$. On the other hand, using (53) yields that

$$\begin{aligned} \frac{S_{k+1}}{S_k}= & {} \frac{\frac{1}{S_k}+2+r+\sqrt{\left( \frac{1}{S_k}+(2+r)\right) ^2-4}}{2} \ge \frac{2+r+\sqrt{(2+r)^2-4}}{2} \nonumber \\= & {} \frac{2+r+\sqrt{r^2+4r}}{2} \end{aligned}$$

(55)

for all $k \ge 0$. Hence, we have $S_k\ge S_0\left( \frac{2+r+\sqrt{r^2+4r}}{2}\right) ^k=\left( \frac{2+r+\sqrt{r^2+4r}}{2}\right) ^k$. $\square $

Remark 7.2

The linear convergence factor $\frac{2}{2+r+\sqrt{r^2+4r}}$ in the above lemma satisfies

$$\begin{aligned} 1-\sqrt{\frac{r}{r+1}} \le \frac{2}{2+r+\sqrt{r^2+4r}} \le \left( 1+\frac{1}{2}\sqrt{r}\right) ^{-2}. \end{aligned}$$

In fact, since

$$\begin{aligned} \left( 1-\sqrt{\frac{r}{r+1}}\right) ^{-1}= & {} \frac{\sqrt{r+1}}{\sqrt{r+1}-\sqrt{r}}=\sqrt{r+1}(\sqrt{r+1}+\sqrt{r})\\= & {} \frac{2+2r+\sqrt{4r^2+4r}}{2}, \end{aligned}$$

we obtain

$$\begin{aligned} \left( 1+\frac{1}{2}\sqrt{r}\right) ^{2}= & {} \frac{2+r/2+\sqrt{4r}}{2} \le \frac{2+r+\sqrt{r^2+4r}}{2} \\\le & {} \frac{2+2r+\sqrt{4r^2+4r}}{2}=\left( 1-\sqrt{\frac{r}{r+1}}\right) ^{-1}. \end{aligned}$$

Note that if $\bar{\sigma }_f=\sigma _f$ and $r=\frac{\sigma _f\sigma _d}{L-\bar{\sigma }_f\sigma _d}$, then $\sqrt{\frac{r}{r+1}}=\sqrt{\frac{\sigma _f\sigma _d}{L}}$.

Lemma 7.3

The sequence $\{S_k\}_{k \ge 0}$ defined by (52) for $r>0$ satisfies

$$\begin{aligned} \frac{\sum _{i=0}^kS_i}{S_k} \le \frac{1+\sqrt{1+4r^{-1}}}{2}\le 1+\sqrt{\frac{1}{r}},\quad \forall k \ge 0. \end{aligned}$$

Proof

Notice that $\gamma := \frac{1+\sqrt{1+4r^{-1}}}{2}$ satisfies

$$\begin{aligned} \left( 1-\frac{1}{\gamma }\right) ^{-1}=\frac{\gamma }{\gamma -1}=\frac{\sqrt{1+4r^{-1}}+1}{\sqrt{1+4r^{-1}}-1}=\frac{(\sqrt{1+4r^{-1}}+1)^2}{4r^{-1}}=\frac{2+r+\sqrt{r^2+4r}}{2}. \end{aligned}$$

Therefore, we obtain $\frac{S_k}{S_{k+1}}\le 1-\frac{1}{\gamma }$ by (55). Now the result follows by induction: if $\sum _{i=0}^kS_i/S_k\le \gamma $ holds for some $k \ge 0$, we have

$$\begin{aligned} \frac{\sum _{i=0}^{k+1}S_i}{S_{k+1}} = 1+\frac{S_k}{S_{k+1}}\frac{\sum _{i=0}^kS_i}{S_k} \le 1+\frac{\gamma -1}{\gamma }\cdot \gamma =\gamma . \end{aligned}$$

This proves the first inequality; the second can be verified from $\sqrt{1+4r^{-1}}\le 1+2\sqrt{r^{-1}}$. $\square $

Note that the result of Lemma 7.3 is the same as [11, Lemma 5] because $1+\frac{2\sqrt{r^{-1}}}{\sqrt{r}+\sqrt{r+4}}=\frac{1+\sqrt{1+4r^{-1}}}{2}$.

Lemma 7.4

Let $\{S_k\}_{k \ge 0}$ be defined as Lemma 7.3 and $\{T_k\}_{k \ge 0}$ be defined by (52) with $r:=0$, namely $T_0:=1$ and $T_{k+1}:=\frac{1+2T_k+\sqrt{1+4T_k}}{2}$ for $k \ge 0$. Then, we have

$$\begin{aligned} \frac{\sum _{i=0}^kS_i}{S_k} \le \frac{\sum _{i=0}^kT_i}{T_k},\quad \forall k \ge 0. \end{aligned}$$

Proof

Due to the identity

$$\begin{aligned} \frac{\sum _{i=0}^kS_i}{S_k} = 1+\sum _{i=0}^{k-1}\frac{S_i}{S_{k}} = 1+\sum _{i=0}^{k-1}\prod _{j=i}^{k-1}\frac{S_j}{S_{j+1}}, \quad k \ge 0, \end{aligned}$$

it is enough to show that $\frac{S_k}{S_{k+1}} \le \frac{T_k}{T_{k+1}}$ for every $k \ge 0$. Notice that we have

$$\begin{aligned} \frac{S_{k+1}}{S_k} = \frac{\frac{1+rS_k}{S_k}+2+\sqrt{\left( \frac{1+rS_k}{S_k}+2\right) ^2-4}}{2},\quad \frac{T_{k+1}}{T_k} = \frac{\frac{1}{T_k}+2+\sqrt{\left( \frac{1}{T_k}+2\right) ^2-4}}{2}, \end{aligned}$$

(56)

which suggests us to prove $\frac{1+rS_k}{S_k} \ge \frac{1}{T_k}$ for $k \ge 0$. It is true for $k=0$ by $S_0=T_0$. If it holds for $k \ge 0$, then, writing $\alpha :=\frac{1+rS_k}{S_k} \ge \beta := \frac{1}{T_k}$, we obtain

$$\begin{aligned} \frac{1+rS_{k+1}}{S_{k+1}}\ge & {} \frac{1+rS_k}{S_{k+1}} = \frac{S_k}{S_{k+1}}\alpha \mathop {=}\limits ^{(56)} \frac{2\alpha }{\alpha +2+\sqrt{(\alpha +2)^2-4}}\\\ge & {} \frac{2\beta }{\beta +2+\sqrt{(\beta +2)^2-4}} \mathop {=}\limits ^{(56)} \frac{T_k}{T_{k+1}}\beta =\frac{1}{T_{k+1}} \end{aligned}$$

since $S_{k+1} \ge S_k$ and $x \mapsto \frac{2x}{x+2+\sqrt{(x+2)^2-4}}=\frac{2}{1+2x^{-1}+\sqrt{1+4x^{-1}}}$ is non-decreasing on $(0,\infty )$. Hence, we claim $\frac{1+rS_k}{S_k} \ge \frac{1}{T_k}$ for all $k \ge 0$ and therefore the proof is completed. $\square $

Lemma 7.5

Let $\{T_k\}_{k \ge 0}$ be a sequence defined by (52) with $r:=0$, namely $T_0:=1$ and $T_{k+1}:=\frac{1+2T_k+\sqrt{1+4T_k}}{2}$ for $k \ge 0$. Then, we have

$$\begin{aligned} \frac{\sum _{i=0}^kT_i}{T_k} \le \frac{1}{3}k +\frac{1}{6}\log (k+2)+1,\quad \forall k \ge 0. \end{aligned}$$

Proof

The case $k =0$ is obvious. Assume that the assertion is true for some $k \ge 0$. Putting $U_k:=\frac{1}{3}k +\frac{1}{6}\log (k+2)+1$, we have

$$\begin{aligned} \frac{\sum _{i=0}^{k+1}T_i}{T_{k+1}} = 1+\frac{T_k}{T_{k+1}}\frac{\sum _{i=0}^kT_i}{T_k} \le 1+\frac{T_k}{T_{k+1}}U_k. \end{aligned}$$

Hence, it remains to show $1+\frac{T_k}{T_{k+1}}U_k \le U_{k+1}$ for $k \ge 0$. For that, we analyze the sequence $t_0:=1,~t_{k+1}:=T_{k+1}-T_k$ for $k \ge 0$ (namely, $T_k=\sum _{i=0}^k t_i$). The recurrence relation of $T_k$ implies $t_{k}^2=(T_{k}-T_{k-1})^2=T_{k}$ and

$$\begin{aligned} t_{k+1}=T_{k+1}-T_k\mathop {=}\limits ^{(53)} \frac{1+\sqrt{1+4T_k}}{2} = \frac{1+\sqrt{1+4t_k^2}}{2},\quad \forall k \ge 0. \end{aligned}$$

Analyzing the difference $t_{k+1}-t_k$ shows for $k \ge 0$ that

$$\begin{aligned} t_{k+1}-t_k= & {} \frac{1+\sqrt{1+4t_k^2}-2t_k}{2}=\frac{1}{2}+\frac{1}{2\left( \sqrt{1+4t_k^2} +2t_k\right) } \\\le & {} \frac{1}{2}+\frac{1}{2\left( \sqrt{4t_k^2}+2t_k\right) } = \frac{1}{2}+\frac{1}{8t_k}. \end{aligned}$$

Since Lemma 7.1 yields $t_k=\sqrt{T_k}\ge \sqrt{(k+1)(k+4)/4}\ge (k+2)/2$ for $k \ge 0$, we obtain

$$\begin{aligned} t_{k+1}\le & {} t_0+\frac{k+1}{2}+\frac{1}{8}\sum _{i=0}^{k}\frac{1}{t_i}\le \frac{k}{2}+\frac{3}{2}+\frac{1}{8}\sum _{i=0}^{k}\frac{2}{i+2}\\\le & {} \frac{k}{2}+\frac{3}{2}+\frac{1}{4}\log (k+2)=\frac{3}{2}U_{k} \end{aligned}$$

for all $k \ge 0$. Finally, this upper bound of $t_k$ concludes that

$$\begin{aligned} \frac{U_k}{1+U_k-U_{k+1}}= & {} \frac{3U_k}{2+\frac{1}{2}\log \frac{k+2}{k+3}}\ge \frac{3}{2}U_k \\\ge & {} t_{k+1}=\frac{t_{k+1}^2}{t_{k+1}}=\frac{T_{k+1}}{T_{k+1}-T_k}. \end{aligned}$$

Taking the inverse and multiplying by $U_k$ for both sides yield $1+\frac{T_k}{T_{k+1}}U_k \le U_{k+1}$. $\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ito, M. New results on subgradient methods for strongly convex optimization problems with a unified analysis. Comput Optim Appl 65, 127–172 (2016). https://doi.org/10.1007/s10589-016-9841-1

Download citation

Received: 07 April 2015
Published: 26 March 2016
Issue Date: September 2016
DOI: https://doi.org/10.1007/s10589-016-9841-1

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

New results on subgradient methods for strongly convex optimization problems with a unified analysis

Abstract

Access this article

Similar content being viewed by others

The Frank-Wolfe Algorithm: A Short Introduction

$\mathbf{C^{2}}$ -Lusin approximation of strongly convex functions

Preconditioned golden ratio primal-dual algorithm with linesearch

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix

Lemma 7.1

Proof

Remark 7.2

Lemma 7.3

Proof

Lemma 7.4

Proof

Lemma 7.5

Proof

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

New results on subgradient methods for strongly convex optimization problems with a unified analysis

Abstract

Access this article

Similar content being viewed by others

The Frank-Wolfe Algorithm: A Short Introduction

$\mathbf{C^{2}}$ -Lusin approximation of strongly convex functions

Preconditioned golden ratio primal-dual algorithm with linesearch

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix

Appendix

Lemma 7.1

Proof

Remark 7.2

Lemma 7.3

Proof

Lemma 7.4

Proof

Lemma 7.5

Proof

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation