Restricted strong convexity and its applications to convergence analysis of gradient-type methods in convex optimization

Zhang, Hui; Cheng, Lizhi

doi:10.1007/s11590-014-0795-x

Restricted strong convexity and its applications to convergence analysis of gradient-type methods in convex optimization

Original Paper
Published: 16 September 2014

Volume 9, pages 961–979, (2015)
Cite this article

Optimization Letters Aims and scope Submit manuscript

Hui Zhang¹ &
Lizhi Cheng¹

2247 Accesses
25 Citations
1 Altmetric
Explore all metrics

Abstract

A great deal of interest of solving large-scale convex optimization problems has, recently, turned to gradient method and its variants. To ensure rates of linear convergence, current theory regularly assumes that the objective functions are strongly convex. This paper goes beyond the traditional wisdom by studying a strictly weaker concept than the strong convexity, named restricted strongly convexity which was recently proposed and can be satisfied by a much broader class of functions. Utilizing the restricted strong convexity, we derive rates of linear convergence for (in)exact gradient-type methods. Besides, we obtain two by-products: (1) we rederive rates of linear convergence of inexact gradient method for a class of structured smooth convex optimizations; (2) we improve the rate of linear convergence for the linearized Bregman algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The restricted strong convexity revisited: analysis of equivalence to error bound and quadratic growth

Article 01 July 2016

Convergence Rate of Gradient-Concordant Methods for Smooth Unconstrained Optimization

New results on subgradient methods for strongly convex optimization problems with a unified analysis

Article 26 March 2016

Notes

Step 5 of Algorithm 2 satisfies $\theta _{k+1}^2=(1-\theta _{k+1})\theta _k^2$; plugging $\theta _k=1/t_k$ and $\theta _{k+1}=1/t_{k+1}$, we obtain $t_{k+1}^{-2}=(1-t_{k+1}^{-1})t_k^{-2}$, which gives step (4.2) in [1]. Also, $\beta _{k+1}$ equals $\frac{t_k-1}{t_{k+1}}$ in (4.3).
Private communication with Rachel Ward (University of Texas at Austin).
Private communication with Rachel Ward (University of Texas at Austin).

References

Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2, 183–202 (2009)
Article MATH MathSciNet Google Scholar
Bertsekas, D.P., Gallager, R.: Data Networks. Prentice-Hall, Englewood Cliffs (1987)
Google Scholar
Blatt, D., Hero, A.O., Gauchman, H.: A convergent incremental gradient method with a constant step size. SIAM J. Optim. 18(1), 29–51 (2007)
Article MATH MathSciNet Google Scholar
Byrd, R.H., Chin, G.M., Nocedal, J., Wu, Y.: Sample size selection in optimization models for machine learning. Math. Progr. Ser. B 134(1), 127–155 (2012)
Article MATH MathSciNet Google Scholar
Friedlander, M.P., Schmidt, M.: Hybrid deterministic-stochastic methods for data fitting. SIAM J. Sci. Comput. 34(3), 1380–1405 (2012)
Article MathSciNet Google Scholar
Lai, M.J., Yin, W.: Augmented $\ell _1$ and nuclear-norm models with a globally linearly convergent algorithm. SIAM J. Imaging Sci. 6(2), 1059–1091 (2013)
Article MATH MathSciNet Google Scholar
Li, W.: Remarks on convergence of the matrix splitting algorithm for the symmetric linear complementarity problem. SIAM J. Optim. 3(1), 155–163 (1993)
Article MATH MathSciNet Google Scholar
Needell, D., Srebro, N., Ward, R.: Stochastic gradient descent and the randomized Kaczmarz algorithm, arXiv:1310.5715v1 [math.NA] (2013)
Nesterov, Y.: A method of solving a convex programming problem with convergence rate O(1/$k^2$). Sov. Math. Doklady 27, 372–376 (1983)
MATH Google Scholar
Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course. Kluwer Academic Publishers, London (2004)
Book Google Scholar
Nesterov, Y.: Gradient methods for minimizing composite objective function, CORE discussion paper (2007)
Parikh, N., Boyd, S.: Proximal algorithm. Found. Trends Optim. 1(3), 123–231 (2014)
Google Scholar
Man-Cho So, A.: Non-asymptotic convergence analysis of inexact gradient methods for machine learning without strong convexity, arXiv:1309.0113v1 [math.OC] (2013)
Tseng, P.: Descent methods for convex essentially smooth minimization. J. Optim. Theory Appl. 71, 425–463 (1991)
Article MATH MathSciNet Google Scholar
Vapnik, V.: The Nature of Statistical Learning Theory. Springer, New York (1995)
Book MATH Google Scholar
Wang, P., Lin, C.: Iteration complexity of feasible descent methods for convex optimization, TR. National Taiwan Univerisity, Taipei (2013)
Google Scholar
Yin, W., Osher, S., Goldfarb, D., Darbon, J.: Bregman iterative algorithms for $\ell _1$-minimization with applications to compressed sensing. SIAM J. Imaging Sci. 1(1), 143–168 (2008)
Article MATH MathSciNet Google Scholar
Zhang, H., Yin, W.: Gradient methods for convex minimization: better rates under weaker conditions, arXiv:1303.4645, CAM Report 13–17, UCLA (2013)
Zhang, H., Cheng, L., Yin, W.: A dual algorithm for a class of augmented convex models, arXiv:1308.6337v1, CAM Report 13–49, UCLA, 2013. Accepted by Communications in Mathematical Sciences (2014)

Download references

Acknowledgments

We thank the reviewers for their careful reading of our manuscript and many valuable comments, and also thank Professor Wotao Yin (UCLA) for his comments and corrections. The work of H. Zhang is supported in part by the Graduate School of NUDT under Fund of Innovation B110202 and NSFC Grant 61201328. The work of L. Cheng is supported in part by NSFC Grants 61271014 and 61072118, and by NNSF of Hunan Province 13JJ2011 and SP of NUDT JC120201.

Author information

Authors and Affiliations

Department of Mathematics and Systems Science, College of Science, National University of Defense Technology, Changsha, 410073, Hunan, China
Hui Zhang & Lizhi Cheng

Authors

Hui Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Lizhi Cheng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hui Zhang.

Appendix

We select the parameter $\theta $ and step size $h$ in (15e) to minimize the upper bound. Let $r=\frac{\nu }{L}, h>0$. As we need to deal with the second term in (15e), two cases are studied below depending on the sign of $h^2-\frac{2\theta h}{L}$:

Case A: $h^2-\frac{2\theta h}{L}\le 0$, i.e., $h\in (0, \frac{2\theta }{L}], \theta \in [0,1]$. Applying the Cauchy-Schwartz inequality to RSI, we get

$$\begin{aligned} \Vert \nabla f(x^{(k)})-\nabla f(x_{\mathrm {prj}}^{(k)})\Vert ^2\ge \nu ^2 \Vert x^{(k)}-x_{\mathrm {prj}}^{(k)}\Vert ^2. \end{aligned}$$

(42)

From $h^2-\frac{2\theta h}{L}\le 0$ and (15e), we derive that

$$\begin{aligned} \Vert x^{(k+1)}-x_{\mathrm {prj}}^{(k+1)}\Vert ^2&\le (1-2(1-\theta )\nu h)\Vert x^{(k)}- x_{\mathrm {prj}}^{(k)}\Vert ^2+\nu ^2\left( h^2-\frac{2\theta h}{L}\right) \Vert x^{(k)}-x_{\mathrm {prj}}^{(k)}\Vert ^2,\quad \quad \end{aligned}$$

(43a)

$$\begin{aligned}&= \left( \nu ^2h^2 -2\left( (1-\theta )\nu +\frac{\theta \nu ^2}{L}\right) h+1\right) \Vert x^{(k)}-x_{\mathrm {prj}}^{(k)}\Vert ^2,\end{aligned}$$

(43b)

$$\begin{aligned}&\triangleq f_1(\theta , h)\Vert x^{(k)}-x_{\mathrm {prj}}^{(k)}\Vert ^2. \end{aligned}$$

(43c)

Let $h_0=\frac{\theta }{L}+\frac{(1-\theta )}{\nu }$, which is the minimum point of the quadratic function $f_1(\theta , h)$ over variable $h$ for each fixed $\theta $. To determine whether such $h_0$ is included in the interval $(0, \frac{2\theta }{L}]$, we consider $h_0=\frac{\theta }{L}+\frac{(1-\theta )}{\nu }=\frac{2\theta }{L}$ and get $\theta = \frac{1}{1+r}$. Now, we split the interval $[0,1]$ into $[\frac{1}{1+r}, 1]$ and $[0, \frac{1}{1+r})$. If $\theta \in [\frac{1}{1+r}, 1]$, we have $\frac{2\theta }{L}\ge h_0$ which means the point $h_0\in (0, \frac{2\theta }{L}]$. Thus,

$$\begin{aligned} \min _{h\le \frac{2\theta }{L}, \frac{1}{1+r}\le \theta \le 1} f_1(\theta ,h)=\min _{\frac{1}{1+r}\le \theta \le 1} f_1(\theta , h_0)=\min _{\frac{1}{1+r}\le \theta \le 1} 1-(1-(1+r)\theta )^2= 1-r^2, \end{aligned}$$

where the minimum value $1-r^2$ is obtained at $\theta =1$ and $h=h_0=\frac{1}{L}$. If $\theta \in [0, \frac{1}{1+r})$, we have $\frac{2\theta }{L}< h_0$ which means the point $h_0\notin (0, \frac{2\theta }{L}]$. By monotone decreasing of $f_1(\theta , h)$ on the interval $h\le \frac{2\theta }{L}$ for each fixed $\theta $, we have

$$\begin{aligned} \min _{h\le \frac{2\theta }{L},0\le \theta < \frac{1}{1+r}} f_1(\theta , h)=\min _{ 0\le \theta < \frac{1}{1+r}} f_1\left( \theta , \frac{2\theta }{L}\right) =\min _{ 0\le \theta < \frac{1}{1+r}} 1-4\theta (1-\theta )r= 1-r \end{aligned}$$

where the minimum value $1-r$ is obtained at $\theta =\frac{1}{2}$ and $h=\frac{2\theta }{L}=\frac{1}{L}$; note that $\frac{1}{2}\in [0, \frac{1}{1+r})$ since $r<1$. Therefore, on the intervals $h\in (0, \frac{2\theta }{L}]$ and $\theta \in [0,1]$, the minimum value $1-r$ of $f_1(\theta , h)$ is obtained at $(\theta , h)=(\frac{1}{2},\frac{1}{L})$.

Case B: $h^2-\frac{2\theta h}{L}\ge 0$, i.e., $h\in [\frac{2\theta }{L}, +\infty ), \theta \in [0,1]$. Applying the Cauchy-Schwartz inequality to (11) in Lemma 2, we get

$$\begin{aligned} \Vert \nabla f(x^{(k)})-\nabla f(x_{\mathrm {prj}}^{(k)})\Vert ^2\le L^2 \Vert x^{(k)}-x_{\mathrm {prj}}^{(k)}\Vert ^2. \end{aligned}$$

(44)

From $h^2-\frac{2\theta h}{L}\ge 0$ and (15e), we derive that

$$\begin{aligned} \Vert x^{(k+1)}-x_{\mathrm {prj}}^{(k+1)}\Vert ^2&\le (1-2(1-\theta )\nu h)\Vert x^{(k)}- x_{\mathrm {prj}}^{(k)}\Vert ^2+L^2\left( h^2-\frac{2\theta h}{L}\right) \Vert x^{(k)}-x_{\mathrm {prj}}^{(k)}\Vert ^2,\quad \quad \quad \end{aligned}$$

(45a)

$$\begin{aligned}&= (L^2h^2 -2(\theta L+(1-\theta )\nu ) h+1)\Vert x^{(k)}-x_{\mathrm {prj}}^{(k)}\Vert ^2,\end{aligned}$$

(45b)

$$\begin{aligned}&\triangleq f_2(\theta , h)\Vert x^{(k)}-x_{\mathrm {prj}}^{(k)}\Vert ^2. \end{aligned}$$

(45c)

Let $h_1=\frac{\theta L+(1-\theta )\nu }{L^2}$, which is the minimum point of the quadratic function $f_2(\theta , h)$ over variable $h$ for each fixed $\theta $. Similarly, we split the interval $[0,1]$ into $(\frac{r}{1+r}, 1]$ and $[0, \frac{r}{1+r}]$. If $\theta \in (\frac{r}{1+r}, 1]$, we have $\frac{2\theta }{L}> h_1$ which means $h_1\notin [\frac{2\theta }{L}, +\infty )$. By monotone increasing of $f_2(\theta , h)$ on the interval $h\ge \frac{2\theta }{L}$ for each fixed $\theta $, we have

$$\begin{aligned} \min _{h\ge \frac{2\theta }{L}, \frac{r}{1+r} <\theta \le 1} f_2(\theta , h)=\min _{\frac{r}{1+r}<\theta \le 1}f_2\left( \theta , \frac{2\theta }{L}\right) =\min _{\frac{r}{1+r} <\theta \le 1} 1-4\theta (1-\theta )r = 1-r, \end{aligned}$$

where the minimum value $1-r$ is obtained at $\theta =1/2$ and $h=\frac{2\theta }{L}=\frac{1}{L}$; note that $\frac{1}{2}\in (\frac{r}{1+r}, 1]$ since $r<1$. If $\theta \in [0, \frac{r}{1+r}]$, we have $\frac{2\theta }{L}\le h_1$ which means $h_1\in [\frac{2\theta }{L}, +\infty )$. Thus,

$$\begin{aligned} \min _{h\ge \frac{2\theta }{L},0\le \theta \le \frac{r}{1+r}} f_2(\theta , h)&=\min _{ 0\le \theta \le \frac{r}{1+r}} f_2(\theta , h_1)\\&=\min _{ 0\le \theta \le \frac{r}{1+r}} 1-\left( \frac{\theta L+(1-\theta )\nu }{L}\right) ^2=1-\left( \frac{2\nu }{L+\nu }\right) ^2, \end{aligned}$$

where the minimum value is obtained at $\theta =\frac{r}{1+r}$ and $h=h_1$. After simple calculations, it holds $r=\frac{\nu }{L}>\left( \frac{2\nu }{L+\nu }\right) ^2$ and hence $1-r< 1-\left( \frac{2\nu }{L+\nu }\right) ^2$. Therefore, on the intervals $h\in [\frac{2\theta }{L}, +\infty )$ and $\theta \in [0,1]$, the minimum value $1-r$ of $f_2(\theta , h)$ is obtained at $(\theta , h)=(\frac{1}{2},\frac{1}{L})$ as well.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, H., Cheng, L. Restricted strong convexity and its applications to convergence analysis of gradient-type methods in convex optimization. Optim Lett 9, 961–979 (2015). https://doi.org/10.1007/s11590-014-0795-x

Download citation

Received: 08 March 2014
Accepted: 03 September 2014
Published: 16 September 2014
Issue Date: June 2015
DOI: https://doi.org/10.1007/s11590-014-0795-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Restricted strong convexity and its applications to convergence analysis of gradient-type methods in convex optimization

Abstract

Access this article

Similar content being viewed by others

The restricted strong convexity revisited: analysis of equivalence to error bound and quadratic growth

Convergence Rate of Gradient-Concordant Methods for Smooth Unconstrained Optimization

New results on subgradient methods for strongly convex optimization problems with a unified analysis

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Restricted strong convexity and its applications to convergence analysis of gradient-type methods in convex optimization

Abstract

Access this article

Similar content being viewed by others

The restricted strong convexity revisited: analysis of equivalence to error bound and quadratic growth

Convergence Rate of Gradient-Concordant Methods for Smooth Unconstrained Optimization

New results on subgradient methods for strongly convex optimization problems with a unified analysis

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation