Skip to main content

Restricted strong convexity and its applications to convergence analysis of gradient-type methods in convex optimization

• Original Paper
• Published:

Abstract

A great deal of interest of solving large-scale convex optimization problems has, recently, turned to gradient method and its variants. To ensure rates of linear convergence, current theory regularly assumes that the objective functions are strongly convex. This paper goes beyond the traditional wisdom by studying a strictly weaker concept than the strong convexity, named restricted strongly convexity which was recently proposed and can be satisfied by a much broader class of functions. Utilizing the restricted strong convexity, we derive rates of linear convergence for (in)exact gradient-type methods. Besides, we obtain two by-products: (1) we rederive rates of linear convergence of inexact gradient method for a class of structured smooth convex optimizations; (2) we improve the rate of linear convergence for the linearized Bregman algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Notes

1. Step 5 of Algorithm 2 satisfies $$\theta _{k+1}^2=(1-\theta _{k+1})\theta _k^2$$; plugging $$\theta _k=1/t_k$$ and $$\theta _{k+1}=1/t_{k+1}$$, we obtain $$t_{k+1}^{-2}=(1-t_{k+1}^{-1})t_k^{-2}$$, which gives step (4.2) in [1]. Also, $$\beta _{k+1}$$ equals $$\frac{t_k-1}{t_{k+1}}$$ in (4.3).

2. Private communication with Rachel Ward (University of Texas at Austin).

3. Private communication with Rachel Ward (University of Texas at Austin).

References

1. Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2, 183–202 (2009)

2. Bertsekas, D.P., Gallager, R.: Data Networks. Prentice-Hall, Englewood Cliffs (1987)

3. Blatt, D., Hero, A.O., Gauchman, H.: A convergent incremental gradient method with a constant step size. SIAM J. Optim. 18(1), 29–51 (2007)

4. Byrd, R.H., Chin, G.M., Nocedal, J., Wu, Y.: Sample size selection in optimization models for machine learning. Math. Progr. Ser. B 134(1), 127–155 (2012)

5. Friedlander, M.P., Schmidt, M.: Hybrid deterministic-stochastic methods for data fitting. SIAM J. Sci. Comput. 34(3), 1380–1405 (2012)

6. Lai, M.J., Yin, W.: Augmented $$\ell _1$$ and nuclear-norm models with a globally linearly convergent algorithm. SIAM J. Imaging Sci. 6(2), 1059–1091 (2013)

7. Li, W.: Remarks on convergence of the matrix splitting algorithm for the symmetric linear complementarity problem. SIAM J. Optim. 3(1), 155–163 (1993)

8. Needell, D., Srebro, N., Ward, R.: Stochastic gradient descent and the randomized Kaczmarz algorithm, arXiv:1310.5715v1 [math.NA] (2013)

9. Nesterov, Y.: A method of solving a convex programming problem with convergence rate O(1/$$k^2$$). Sov. Math. Doklady 27, 372–376 (1983)

10. Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course. Kluwer Academic Publishers, London (2004)

11. Nesterov, Y.: Gradient methods for minimizing composite objective function, CORE discussion paper (2007)

12. Parikh, N., Boyd, S.: Proximal algorithm. Found. Trends Optim. 1(3), 123–231 (2014)

13. Man-Cho So, A.: Non-asymptotic convergence analysis of inexact gradient methods for machine learning without strong convexity, arXiv:1309.0113v1 [math.OC] (2013)

14. Tseng, P.: Descent methods for convex essentially smooth minimization. J. Optim. Theory Appl. 71, 425–463 (1991)

15. Vapnik, V.: The Nature of Statistical Learning Theory. Springer, New York (1995)

16. Wang, P., Lin, C.: Iteration complexity of feasible descent methods for convex optimization, TR. National Taiwan Univerisity, Taipei (2013)

17. Yin, W., Osher, S., Goldfarb, D., Darbon, J.: Bregman iterative algorithms for $$\ell _1$$-minimization with applications to compressed sensing. SIAM J. Imaging Sci. 1(1), 143–168 (2008)

18. Zhang, H., Yin, W.: Gradient methods for convex minimization: better rates under weaker conditions, arXiv:1303.4645, CAM Report 13–17, UCLA (2013)

19. Zhang, H., Cheng, L., Yin, W.: A dual algorithm for a class of augmented convex models, arXiv:1308.6337v1, CAM Report 13–49, UCLA, 2013. Accepted by Communications in Mathematical Sciences (2014)

Download references

Acknowledgments

We thank the reviewers for their careful reading of our manuscript and many valuable comments, and also thank Professor Wotao Yin (UCLA) for his comments and corrections. The work of H. Zhang is supported in part by the Graduate School of NUDT under Fund of Innovation B110202 and NSFC Grant 61201328. The work of L. Cheng is supported in part by NSFC Grants 61271014 and 61072118, and by NNSF of Hunan Province 13JJ2011 and SP of NUDT JC120201.

Author information

Authors

Corresponding author

Correspondence to Hui Zhang.

Appendix

Appendix

We select the parameter $$\theta$$ and step size $$h$$ in (15e) to minimize the upper bound. Let $$r=\frac{\nu }{L}, h>0$$. As we need to deal with the second term in (15e), two cases are studied below depending on the sign of $$h^2-\frac{2\theta h}{L}$$:

Case A: $$h^2-\frac{2\theta h}{L}\le 0$$, i.e., $$h\in (0, \frac{2\theta }{L}], \theta \in [0,1]$$. Applying the Cauchy-Schwartz inequality to RSI, we get

\begin{aligned} \Vert \nabla f(x^{(k)})-\nabla f(x_{\mathrm {prj}}^{(k)})\Vert ^2\ge \nu ^2 \Vert x^{(k)}-x_{\mathrm {prj}}^{(k)}\Vert ^2. \end{aligned}
(42)

From $$h^2-\frac{2\theta h}{L}\le 0$$ and (15e), we derive that

\begin{aligned} \Vert x^{(k+1)}-x_{\mathrm {prj}}^{(k+1)}\Vert ^2&\le (1-2(1-\theta )\nu h)\Vert x^{(k)}- x_{\mathrm {prj}}^{(k)}\Vert ^2+\nu ^2\left( h^2-\frac{2\theta h}{L}\right) \Vert x^{(k)}-x_{\mathrm {prj}}^{(k)}\Vert ^2,\quad \quad \end{aligned}
(43a)
\begin{aligned}&= \left( \nu ^2h^2 -2\left( (1-\theta )\nu +\frac{\theta \nu ^2}{L}\right) h+1\right) \Vert x^{(k)}-x_{\mathrm {prj}}^{(k)}\Vert ^2,\end{aligned}
(43b)
\begin{aligned}&\triangleq f_1(\theta , h)\Vert x^{(k)}-x_{\mathrm {prj}}^{(k)}\Vert ^2. \end{aligned}
(43c)

Let $$h_0=\frac{\theta }{L}+\frac{(1-\theta )}{\nu }$$, which is the minimum point of the quadratic function $$f_1(\theta , h)$$ over variable $$h$$ for each fixed $$\theta$$. To determine whether such $$h_0$$ is included in the interval $$(0, \frac{2\theta }{L}]$$, we consider $$h_0=\frac{\theta }{L}+\frac{(1-\theta )}{\nu }=\frac{2\theta }{L}$$ and get $$\theta = \frac{1}{1+r}$$. Now, we split the interval $$[0,1]$$ into $$[\frac{1}{1+r}, 1]$$ and $$[0, \frac{1}{1+r})$$. If $$\theta \in [\frac{1}{1+r}, 1]$$, we have $$\frac{2\theta }{L}\ge h_0$$ which means the point $$h_0\in (0, \frac{2\theta }{L}]$$. Thus,

\begin{aligned} \min _{h\le \frac{2\theta }{L}, \frac{1}{1+r}\le \theta \le 1} f_1(\theta ,h)=\min _{\frac{1}{1+r}\le \theta \le 1} f_1(\theta , h_0)=\min _{\frac{1}{1+r}\le \theta \le 1} 1-(1-(1+r)\theta )^2= 1-r^2, \end{aligned}

where the minimum value $$1-r^2$$ is obtained at $$\theta =1$$ and $$h=h_0=\frac{1}{L}$$. If $$\theta \in [0, \frac{1}{1+r})$$, we have $$\frac{2\theta }{L}< h_0$$ which means the point $$h_0\notin (0, \frac{2\theta }{L}]$$. By monotone decreasing of $$f_1(\theta , h)$$ on the interval $$h\le \frac{2\theta }{L}$$ for each fixed $$\theta$$, we have

\begin{aligned} \min _{h\le \frac{2\theta }{L},0\le \theta < \frac{1}{1+r}} f_1(\theta , h)=\min _{ 0\le \theta < \frac{1}{1+r}} f_1\left( \theta , \frac{2\theta }{L}\right) =\min _{ 0\le \theta < \frac{1}{1+r}} 1-4\theta (1-\theta )r= 1-r \end{aligned}

where the minimum value $$1-r$$ is obtained at $$\theta =\frac{1}{2}$$ and $$h=\frac{2\theta }{L}=\frac{1}{L}$$; note that $$\frac{1}{2}\in [0, \frac{1}{1+r})$$ since $$r<1$$. Therefore, on the intervals $$h\in (0, \frac{2\theta }{L}]$$ and $$\theta \in [0,1]$$, the minimum value $$1-r$$ of $$f_1(\theta , h)$$ is obtained at $$(\theta , h)=(\frac{1}{2},\frac{1}{L})$$.

Case B: $$h^2-\frac{2\theta h}{L}\ge 0$$, i.e., $$h\in [\frac{2\theta }{L}, +\infty ), \theta \in [0,1]$$. Applying the Cauchy-Schwartz inequality to (11) in Lemma 2, we get

\begin{aligned} \Vert \nabla f(x^{(k)})-\nabla f(x_{\mathrm {prj}}^{(k)})\Vert ^2\le L^2 \Vert x^{(k)}-x_{\mathrm {prj}}^{(k)}\Vert ^2. \end{aligned}
(44)

From $$h^2-\frac{2\theta h}{L}\ge 0$$ and (15e), we derive that

\begin{aligned} \Vert x^{(k+1)}-x_{\mathrm {prj}}^{(k+1)}\Vert ^2&\le (1-2(1-\theta )\nu h)\Vert x^{(k)}- x_{\mathrm {prj}}^{(k)}\Vert ^2+L^2\left( h^2-\frac{2\theta h}{L}\right) \Vert x^{(k)}-x_{\mathrm {prj}}^{(k)}\Vert ^2,\quad \quad \quad \end{aligned}
(45a)
\begin{aligned}&= (L^2h^2 -2(\theta L+(1-\theta )\nu ) h+1)\Vert x^{(k)}-x_{\mathrm {prj}}^{(k)}\Vert ^2,\end{aligned}
(45b)
\begin{aligned}&\triangleq f_2(\theta , h)\Vert x^{(k)}-x_{\mathrm {prj}}^{(k)}\Vert ^2. \end{aligned}
(45c)

Let $$h_1=\frac{\theta L+(1-\theta )\nu }{L^2}$$, which is the minimum point of the quadratic function $$f_2(\theta , h)$$ over variable $$h$$ for each fixed $$\theta$$. Similarly, we split the interval $$[0,1]$$ into $$(\frac{r}{1+r}, 1]$$ and $$[0, \frac{r}{1+r}]$$. If $$\theta \in (\frac{r}{1+r}, 1]$$, we have $$\frac{2\theta }{L}> h_1$$ which means $$h_1\notin [\frac{2\theta }{L}, +\infty )$$. By monotone increasing of $$f_2(\theta , h)$$ on the interval $$h\ge \frac{2\theta }{L}$$ for each fixed $$\theta$$, we have

\begin{aligned} \min _{h\ge \frac{2\theta }{L}, \frac{r}{1+r} <\theta \le 1} f_2(\theta , h)=\min _{\frac{r}{1+r}<\theta \le 1}f_2\left( \theta , \frac{2\theta }{L}\right) =\min _{\frac{r}{1+r} <\theta \le 1} 1-4\theta (1-\theta )r = 1-r, \end{aligned}

where the minimum value $$1-r$$ is obtained at $$\theta =1/2$$ and $$h=\frac{2\theta }{L}=\frac{1}{L}$$; note that $$\frac{1}{2}\in (\frac{r}{1+r}, 1]$$ since $$r<1$$. If $$\theta \in [0, \frac{r}{1+r}]$$, we have $$\frac{2\theta }{L}\le h_1$$ which means $$h_1\in [\frac{2\theta }{L}, +\infty )$$. Thus,

\begin{aligned} \min _{h\ge \frac{2\theta }{L},0\le \theta \le \frac{r}{1+r}} f_2(\theta , h)&=\min _{ 0\le \theta \le \frac{r}{1+r}} f_2(\theta , h_1)\\&=\min _{ 0\le \theta \le \frac{r}{1+r}} 1-\left( \frac{\theta L+(1-\theta )\nu }{L}\right) ^2=1-\left( \frac{2\nu }{L+\nu }\right) ^2, \end{aligned}

where the minimum value is obtained at $$\theta =\frac{r}{1+r}$$ and $$h=h_1$$. After simple calculations, it holds $$r=\frac{\nu }{L}>\left( \frac{2\nu }{L+\nu }\right) ^2$$ and hence $$1-r< 1-\left( \frac{2\nu }{L+\nu }\right) ^2$$. Therefore, on the intervals $$h\in [\frac{2\theta }{L}, +\infty )$$ and $$\theta \in [0,1]$$, the minimum value $$1-r$$ of $$f_2(\theta , h)$$ is obtained at $$(\theta , h)=(\frac{1}{2},\frac{1}{L})$$ as well.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, H., Cheng, L. Restricted strong convexity and its applications to convergence analysis of gradient-type methods in convex optimization. Optim Lett 9, 961–979 (2015). https://doi.org/10.1007/s11590-014-0795-x

Download citation

• Received:

• Accepted:

• Published:

• Issue Date:

• DOI: https://doi.org/10.1007/s11590-014-0795-x