Skip to main content
Log in

Restricted strong convexity and its applications to convergence analysis of gradient-type methods in convex optimization

  • Original Paper
  • Published:
Optimization Letters Aims and scope Submit manuscript

Abstract

A great deal of interest of solving large-scale convex optimization problems has, recently, turned to gradient method and its variants. To ensure rates of linear convergence, current theory regularly assumes that the objective functions are strongly convex. This paper goes beyond the traditional wisdom by studying a strictly weaker concept than the strong convexity, named restricted strongly convexity which was recently proposed and can be satisfied by a much broader class of functions. Utilizing the restricted strong convexity, we derive rates of linear convergence for (in)exact gradient-type methods. Besides, we obtain two by-products: (1) we rederive rates of linear convergence of inexact gradient method for a class of structured smooth convex optimizations; (2) we improve the rate of linear convergence for the linearized Bregman algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Notes

  1. Step 5 of Algorithm 2 satisfies \(\theta _{k+1}^2=(1-\theta _{k+1})\theta _k^2\); plugging \(\theta _k=1/t_k\) and \(\theta _{k+1}=1/t_{k+1}\), we obtain \(t_{k+1}^{-2}=(1-t_{k+1}^{-1})t_k^{-2}\), which gives step (4.2) in [1]. Also, \(\beta _{k+1}\) equals \(\frac{t_k-1}{t_{k+1}}\) in (4.3).

  2. Private communication with Rachel Ward (University of Texas at Austin).

  3. Private communication with Rachel Ward (University of Texas at Austin).

References

  1. Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2, 183–202 (2009)

    Article  MATH  MathSciNet  Google Scholar 

  2. Bertsekas, D.P., Gallager, R.: Data Networks. Prentice-Hall, Englewood Cliffs (1987)

    Google Scholar 

  3. Blatt, D., Hero, A.O., Gauchman, H.: A convergent incremental gradient method with a constant step size. SIAM J. Optim. 18(1), 29–51 (2007)

    Article  MATH  MathSciNet  Google Scholar 

  4. Byrd, R.H., Chin, G.M., Nocedal, J., Wu, Y.: Sample size selection in optimization models for machine learning. Math. Progr. Ser. B 134(1), 127–155 (2012)

    Article  MATH  MathSciNet  Google Scholar 

  5. Friedlander, M.P., Schmidt, M.: Hybrid deterministic-stochastic methods for data fitting. SIAM J. Sci. Comput. 34(3), 1380–1405 (2012)

    Article  MathSciNet  Google Scholar 

  6. Lai, M.J., Yin, W.: Augmented \(\ell _1\) and nuclear-norm models with a globally linearly convergent algorithm. SIAM J. Imaging Sci. 6(2), 1059–1091 (2013)

    Article  MATH  MathSciNet  Google Scholar 

  7. Li, W.: Remarks on convergence of the matrix splitting algorithm for the symmetric linear complementarity problem. SIAM J. Optim. 3(1), 155–163 (1993)

    Article  MATH  MathSciNet  Google Scholar 

  8. Needell, D., Srebro, N., Ward, R.: Stochastic gradient descent and the randomized Kaczmarz algorithm, arXiv:1310.5715v1 [math.NA] (2013)

  9. Nesterov, Y.: A method of solving a convex programming problem with convergence rate O(1/\(k^2\)). Sov. Math. Doklady 27, 372–376 (1983)

    MATH  Google Scholar 

  10. Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course. Kluwer Academic Publishers, London (2004)

    Book  Google Scholar 

  11. Nesterov, Y.: Gradient methods for minimizing composite objective function, CORE discussion paper (2007)

  12. Parikh, N., Boyd, S.: Proximal algorithm. Found. Trends Optim. 1(3), 123–231 (2014)

    Google Scholar 

  13. Man-Cho So, A.: Non-asymptotic convergence analysis of inexact gradient methods for machine learning without strong convexity, arXiv:1309.0113v1 [math.OC] (2013)

  14. Tseng, P.: Descent methods for convex essentially smooth minimization. J. Optim. Theory Appl. 71, 425–463 (1991)

    Article  MATH  MathSciNet  Google Scholar 

  15. Vapnik, V.: The Nature of Statistical Learning Theory. Springer, New York (1995)

    Book  MATH  Google Scholar 

  16. Wang, P., Lin, C.: Iteration complexity of feasible descent methods for convex optimization, TR. National Taiwan Univerisity, Taipei (2013)

    Google Scholar 

  17. Yin, W., Osher, S., Goldfarb, D., Darbon, J.: Bregman iterative algorithms for \(\ell _1\)-minimization with applications to compressed sensing. SIAM J. Imaging Sci. 1(1), 143–168 (2008)

    Article  MATH  MathSciNet  Google Scholar 

  18. Zhang, H., Yin, W.: Gradient methods for convex minimization: better rates under weaker conditions, arXiv:1303.4645, CAM Report 13–17, UCLA (2013)

  19. Zhang, H., Cheng, L., Yin, W.: A dual algorithm for a class of augmented convex models, arXiv:1308.6337v1, CAM Report 13–49, UCLA, 2013. Accepted by Communications in Mathematical Sciences (2014)

Download references

Acknowledgments

We thank the reviewers for their careful reading of our manuscript and many valuable comments, and also thank Professor Wotao Yin (UCLA) for his comments and corrections. The work of H. Zhang is supported in part by the Graduate School of NUDT under Fund of Innovation B110202 and NSFC Grant 61201328. The work of L. Cheng is supported in part by NSFC Grants 61271014 and 61072118, and by NNSF of Hunan Province 13JJ2011 and SP of NUDT JC120201.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hui Zhang.

Appendix

Appendix

We select the parameter \(\theta \) and step size \(h\) in (15e) to minimize the upper bound. Let \(r=\frac{\nu }{L}, h>0\). As we need to deal with the second term in (15e), two cases are studied below depending on the sign of \(h^2-\frac{2\theta h}{L}\):

Case A: \(h^2-\frac{2\theta h}{L}\le 0\), i.e., \(h\in (0, \frac{2\theta }{L}], \theta \in [0,1]\). Applying the Cauchy-Schwartz inequality to RSI, we get

$$\begin{aligned} \Vert \nabla f(x^{(k)})-\nabla f(x_{\mathrm {prj}}^{(k)})\Vert ^2\ge \nu ^2 \Vert x^{(k)}-x_{\mathrm {prj}}^{(k)}\Vert ^2. \end{aligned}$$
(42)

From \(h^2-\frac{2\theta h}{L}\le 0\) and (15e), we derive that

$$\begin{aligned} \Vert x^{(k+1)}-x_{\mathrm {prj}}^{(k+1)}\Vert ^2&\le (1-2(1-\theta )\nu h)\Vert x^{(k)}- x_{\mathrm {prj}}^{(k)}\Vert ^2+\nu ^2\left( h^2-\frac{2\theta h}{L}\right) \Vert x^{(k)}-x_{\mathrm {prj}}^{(k)}\Vert ^2,\quad \quad \end{aligned}$$
(43a)
$$\begin{aligned}&= \left( \nu ^2h^2 -2\left( (1-\theta )\nu +\frac{\theta \nu ^2}{L}\right) h+1\right) \Vert x^{(k)}-x_{\mathrm {prj}}^{(k)}\Vert ^2,\end{aligned}$$
(43b)
$$\begin{aligned}&\triangleq f_1(\theta , h)\Vert x^{(k)}-x_{\mathrm {prj}}^{(k)}\Vert ^2. \end{aligned}$$
(43c)

Let \(h_0=\frac{\theta }{L}+\frac{(1-\theta )}{\nu }\), which is the minimum point of the quadratic function \(f_1(\theta , h)\) over variable \(h\) for each fixed \(\theta \). To determine whether such \(h_0\) is included in the interval \((0, \frac{2\theta }{L}]\), we consider \(h_0=\frac{\theta }{L}+\frac{(1-\theta )}{\nu }=\frac{2\theta }{L}\) and get \(\theta = \frac{1}{1+r}\). Now, we split the interval \([0,1]\) into \([\frac{1}{1+r}, 1]\) and \([0, \frac{1}{1+r})\). If \(\theta \in [\frac{1}{1+r}, 1]\), we have \(\frac{2\theta }{L}\ge h_0\) which means the point \(h_0\in (0, \frac{2\theta }{L}]\). Thus,

$$\begin{aligned} \min _{h\le \frac{2\theta }{L}, \frac{1}{1+r}\le \theta \le 1} f_1(\theta ,h)=\min _{\frac{1}{1+r}\le \theta \le 1} f_1(\theta , h_0)=\min _{\frac{1}{1+r}\le \theta \le 1} 1-(1-(1+r)\theta )^2= 1-r^2, \end{aligned}$$

where the minimum value \(1-r^2\) is obtained at \(\theta =1\) and \(h=h_0=\frac{1}{L}\). If \(\theta \in [0, \frac{1}{1+r})\), we have \(\frac{2\theta }{L}< h_0\) which means the point \(h_0\notin (0, \frac{2\theta }{L}]\). By monotone decreasing of \(f_1(\theta , h)\) on the interval \(h\le \frac{2\theta }{L}\) for each fixed \(\theta \), we have

$$\begin{aligned} \min _{h\le \frac{2\theta }{L},0\le \theta < \frac{1}{1+r}} f_1(\theta , h)=\min _{ 0\le \theta < \frac{1}{1+r}} f_1\left( \theta , \frac{2\theta }{L}\right) =\min _{ 0\le \theta < \frac{1}{1+r}} 1-4\theta (1-\theta )r= 1-r \end{aligned}$$

where the minimum value \(1-r\) is obtained at \(\theta =\frac{1}{2}\) and \(h=\frac{2\theta }{L}=\frac{1}{L}\); note that \(\frac{1}{2}\in [0, \frac{1}{1+r})\) since \(r<1\). Therefore, on the intervals \(h\in (0, \frac{2\theta }{L}]\) and \(\theta \in [0,1]\), the minimum value \(1-r\) of \(f_1(\theta , h)\) is obtained at \((\theta , h)=(\frac{1}{2},\frac{1}{L})\).

Case B: \(h^2-\frac{2\theta h}{L}\ge 0\), i.e., \(h\in [\frac{2\theta }{L}, +\infty ), \theta \in [0,1]\). Applying the Cauchy-Schwartz inequality to (11) in Lemma 2, we get

$$\begin{aligned} \Vert \nabla f(x^{(k)})-\nabla f(x_{\mathrm {prj}}^{(k)})\Vert ^2\le L^2 \Vert x^{(k)}-x_{\mathrm {prj}}^{(k)}\Vert ^2. \end{aligned}$$
(44)

From \(h^2-\frac{2\theta h}{L}\ge 0\) and (15e), we derive that

$$\begin{aligned} \Vert x^{(k+1)}-x_{\mathrm {prj}}^{(k+1)}\Vert ^2&\le (1-2(1-\theta )\nu h)\Vert x^{(k)}- x_{\mathrm {prj}}^{(k)}\Vert ^2+L^2\left( h^2-\frac{2\theta h}{L}\right) \Vert x^{(k)}-x_{\mathrm {prj}}^{(k)}\Vert ^2,\quad \quad \quad \end{aligned}$$
(45a)
$$\begin{aligned}&= (L^2h^2 -2(\theta L+(1-\theta )\nu ) h+1)\Vert x^{(k)}-x_{\mathrm {prj}}^{(k)}\Vert ^2,\end{aligned}$$
(45b)
$$\begin{aligned}&\triangleq f_2(\theta , h)\Vert x^{(k)}-x_{\mathrm {prj}}^{(k)}\Vert ^2. \end{aligned}$$
(45c)

Let \(h_1=\frac{\theta L+(1-\theta )\nu }{L^2}\), which is the minimum point of the quadratic function \(f_2(\theta , h)\) over variable \(h\) for each fixed \(\theta \). Similarly, we split the interval \([0,1]\) into \((\frac{r}{1+r}, 1]\) and \([0, \frac{r}{1+r}]\). If \(\theta \in (\frac{r}{1+r}, 1]\), we have \(\frac{2\theta }{L}> h_1\) which means \(h_1\notin [\frac{2\theta }{L}, +\infty )\). By monotone increasing of \(f_2(\theta , h)\) on the interval \(h\ge \frac{2\theta }{L}\) for each fixed \(\theta \), we have

$$\begin{aligned} \min _{h\ge \frac{2\theta }{L}, \frac{r}{1+r} <\theta \le 1} f_2(\theta , h)=\min _{\frac{r}{1+r}<\theta \le 1}f_2\left( \theta , \frac{2\theta }{L}\right) =\min _{\frac{r}{1+r} <\theta \le 1} 1-4\theta (1-\theta )r = 1-r, \end{aligned}$$

where the minimum value \(1-r\) is obtained at \(\theta =1/2\) and \(h=\frac{2\theta }{L}=\frac{1}{L}\); note that \(\frac{1}{2}\in (\frac{r}{1+r}, 1]\) since \(r<1\). If \(\theta \in [0, \frac{r}{1+r}]\), we have \(\frac{2\theta }{L}\le h_1\) which means \(h_1\in [\frac{2\theta }{L}, +\infty )\). Thus,

$$\begin{aligned} \min _{h\ge \frac{2\theta }{L},0\le \theta \le \frac{r}{1+r}} f_2(\theta , h)&=\min _{ 0\le \theta \le \frac{r}{1+r}} f_2(\theta , h_1)\\&=\min _{ 0\le \theta \le \frac{r}{1+r}} 1-\left( \frac{\theta L+(1-\theta )\nu }{L}\right) ^2=1-\left( \frac{2\nu }{L+\nu }\right) ^2, \end{aligned}$$

where the minimum value is obtained at \(\theta =\frac{r}{1+r}\) and \(h=h_1\). After simple calculations, it holds \(r=\frac{\nu }{L}>\left( \frac{2\nu }{L+\nu }\right) ^2\) and hence \(1-r< 1-\left( \frac{2\nu }{L+\nu }\right) ^2\). Therefore, on the intervals \(h\in [\frac{2\theta }{L}, +\infty )\) and \(\theta \in [0,1]\), the minimum value \(1-r\) of \(f_2(\theta , h)\) is obtained at \((\theta , h)=(\frac{1}{2},\frac{1}{L})\) as well.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, H., Cheng, L. Restricted strong convexity and its applications to convergence analysis of gradient-type methods in convex optimization. Optim Lett 9, 961–979 (2015). https://doi.org/10.1007/s11590-014-0795-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11590-014-0795-x

Keywords

Navigation