Skip to main content
Log in

Perturbation Techniques for Convergence Analysis of Proximal Gradient Method and Other First-Order Algorithms via Variational Analysis

  • Published:
Set-Valued and Variational Analysis Aims and scope Submit manuscript

Abstract

We develop new perturbation techniques for conducting convergence analysis of various first-order algorithms for a class of nonsmooth optimization problems. We consider the iteration scheme of an algorithm to construct a perturbed stationary point set-valued map, and define the perturbing parameter by the difference of two consecutive iterates. Then, we show that the calmness condition of the induced set-valued map, together with a local version of the proper separation of stationary value condition, is a sufficient condition to ensure the linear convergence of the algorithm. The equivalence of the calmness condition to the one for the canonically perturbed stationary point set-valued map is proved, and this equivalence allows us to derive some sufficient conditions for calmness by using some recent developments in variational analysis. These sufficient conditions are different from existing results (especially, those error-bound-based ones) in that they can be easily verified for many concrete application models. Our analysis is focused on the fundamental proximal gradient (PG) method, and it enables us to show that any accumulation of the sequence generated by the PG method must be a stationary point in terms of the proximal subdifferential, instead of the limiting subdifferential. This result finds the surprising fact that the solution quality found by the PG method is in general superior. Our analysis also leads to some improvement for the linear convergence results of the PG method in the convex case. The new perturbation technique can be conveniently used to derive linear rate convergence of a number of other first-order methods including the well-known alternating direction method of multipliers and primal-dual hybrid gradient method, under mild assumptions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Attouch, H., Bolte, J., Redont, P., Soubeyran, A.: Proximal alternating minimization and projection methods for nonconvex problems: an approach based on the Kurdyka-Łojasiewicz inequality. Math. Oper. Res. 35(2), 438–457 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  2. Attouch, H., Bolte, J., Svaiter, B.: Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward-backward splitting, and regularized Gauss-Seidel methods. Math. Program. 137(1), 91–129 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  3. Aubin, J.: Lipschitz behavior of solutions to convex minimization problems. Math. Oper. Res. 9(1), 87–111 (1984)

    Article  MathSciNet  MATH  Google Scholar 

  4. Bai, K., Ye, J. J., Zhang, J.: Directional quasi-/pseudo-normality as sufficient conditions for metric subregularity. SIAM J. Optim. 29(4), 2625–2649 (2019)

    Article  MathSciNet  MATH  Google Scholar 

  5. Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2(1), 183–202 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  6. Bello-Cruz, J., Li, G., Nghia, T. T. A.: On the Q-linear convergence of forward-backward splitting method and uniqueness of optimal solution to lasso. arXiv:1806.06333 (2018)

  7. Bolte, J., Daniilidis, A., Lewis, A.: The Łojasiewicz inequality for nonsmooth subanalytic functions with applications to subgradient dynamical systems. SIAM J. Optim. 17(4), 1205–1223 (2007)

    Article  MATH  Google Scholar 

  8. Bolte, J, Daniilidis, A., Ley, O., Mazet, L.: Characterizations of Łojasiewicz inequalities: subgradient flows, talweg, convexity. Trans. Amer. Math. Soc. 362(6), 3319–3363 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  9. Bolte, J., Nguyen, T. P., Peypouquet, J., Suter, B. W.: From error bounds to the complexity of first-order descent methods for convex functions. Math. Program. 165, 471–507 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  10. Bolte, J., Sabach, S., Teboulle, M.: Proximal alternating linearized minimization for nonconvex and nonsmooth problems. Math. Program. 146(1), 459–494 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  11. Cauchy, A. L.: Méthode générale pour la résolution des systèms d’equations simultanées. Comptes Rendus de l’Académie des Sciences 25, 46–89 (1847)

    Google Scholar 

  12. Chambolle, A., Pock, T.: A first-order primal-dual algorithms for convex problem with applications to imaging. J. Math. Imaging Vis. 40, 120–145 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  13. Clarke, F., Ledyaev, Y., Stern, R., Wolenski, P.: Nonsmooth Analysis and Control Theory, vol. 178. Springer Science & Business Media (2008)

  14. Clarke, F., Stern, R., Wolenski, P.: Subgradient critieria for montonicity, the Lipschitz condition, and convexity. Can. J. Math. 45, 1167–1183 (1993)

    Article  MATH  Google Scholar 

  15. Dontchev, A., Rockafellar, R. T.: Implicit Functions and Solution Mappings. Springer Monographs in Mathematics (2009)

  16. Drusvyatskiy, D., Ioffe, A., Lewis, A.: Nonsmooth optimization using taylor-like models: error bounds, convergence, and termination criteria. Math. Program., 1–27 (2019)

  17. Drusvyatskiy, D., Lewis, A.: Error bounds, quadratic growth, and linear convergence of proximal methods. Math. Oper. Res. 43(3), 919–948 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  18. Drusvyatskiy, D., Mordukhovich, B. S., Nghia, T. T. A.: Second-order growth, tilt stability, and metric regularity of the subdifferential. J. Convex Anal. 21(4), 1165–1192 (2014)

    MathSciNet  MATH  Google Scholar 

  19. Fan, J. Q., Li, R. Z.: Variable selection via nonconcave penalized likelihood and its oracle properties. J. Amer. Stat. Assoc. 96(456), 1348–1360 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  20. Frankel, P., Garrigos, G., Peypouquet, J.: Splitting methods with variable metric for Kurdyka-Łojasiewicz functions and general convergence rates. J. Optim. Theory Appl. 165(3), 874–900 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  21. Gfrerer, H.: On directional metric regularity, subregularity and optimality conditions for nonsmooth mathematical programs. Set-Valued Var. Anal. 21(2), 151–176 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  22. Gfrerer, H.: Optimality conditions for disjunctive programs based on generalized differentiation with application to mathematical programs with equilibrium constraints. SIAM J. Optim. 24(2), 898–931 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  23. Gfrerer, H., Klatte, D.: Lipschitz and Hölder stability of optimization problems and generalized equations. Math. Program. 158(1), 35–75 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  24. Gfrerer, H., Outrata, J.: On Lipschitzian properties of implicit multifunctions. SIAM J. Optim. 26(4), 2160–2189 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  25. Gfrerer, H., Ye, J. J.: New constraint qualifications for mathematical programs with equilibrium constraints via variational analysis. SIAM J. Optim. 27 (2), 842–865 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  26. Ginchev, I., Mordukhovich, B. S.: On directionally dependent subdifferentials. Comptes rendus de L’Academie Bulgare des Sciences 64, 497–508 (2011)

    MathSciNet  MATH  Google Scholar 

  27. Goodfellow, T., Bengio, Y., Courville, A.: Deep Learning. MIT Press (2016)

  28. He, B., Fu, X., Jiang, Z.: Proximal-point algorithm using a linear proximal term. J. Optim. Theory Appl. 141(2), 299–319 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  29. He, B., Yuan, X.: Convergence analysis of primal-dual algorithms for a saddle-point problem: from contraction perspective. J. Math. Imaging Vis. 5(1), 119–149 (2012)

    MathSciNet  MATH  Google Scholar 

  30. Henrion, R., Jourani, A., Outrata, J.: On the calmness of a class of multifunctions. SIAM J. Optim. 13(2), 603–618 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  31. Henrion, R., Outrata, J.: Calmness of constraint systems with applications. Math. Program. 104(2–3), 437–464 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  32. Hoffman, A. J.: On approximate solutions of systems of linear inequalities. J. Res. Natl. Bur. Stand. 49(4), 263–265 (1952)

    Article  MathSciNet  Google Scholar 

  33. Klatte, D., Kummer, B.: Constrained minima and lipschitzian penalties in metric spaces. SIAM J. Optim. 13(2), 619–633 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  34. LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436–444 (2015)

    Article  Google Scholar 

  35. Li, G., Pong, T. -K.: Calculus of the exponent of Kurdyka-Łojasiewicz inequality and its applications to linear convergence of first-order methods. Found. Comut. Math. 18, 1199–1232 (2018)

    Article  MATH  Google Scholar 

  36. Liu, Y., Yuan, X., Zeng, S., Zhang, J.: Partial error bound conditions and the linear convergence rate of alternating direction method of multipliers. SIAM J. Numer. Anal. 56(4), 2095–2123 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  37. Luo, Z. -Q., Tseng, P.: Error bounds and convergence analysis of feasible descent methods: a general approach. Ann. Oper. Res. 46(1), 157–178 (1993)

    Article  MathSciNet  MATH  Google Scholar 

  38. Luo, Z. -Q., Tseng, P.: Error bound and convergence analysis of matrix splitting algorithms for the affine variational inequality problem. SIAM J. Optim. 2(1), 43–54 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  39. Mordukhovich, B.: Variational Analysis and Generalized Differentiation I: Basic theory, II: Applications. Springer Science & Business Media (2006)

  40. Nesterov, Y.: Introductory Lectures on Convex Optimization: a Basic Course, vol. 87. Springer Science & Business Media (2013)

  41. Noll, D.: Convergence of non-smooth descent methods using the Kurdyka–Łojasiewicz inequality. J. Optim. Theory Appl. 160(2), 553–572 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  42. O’donoghue, B., Candes, E.: Adaptive restart for accelerated gradient schemes. Found. Comut. Math. 15(3), 715–732 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  43. Pan, S., Liu, Y. (2019)

  44. Passty, G.: Ergodic convergence to a zero of the sum of monotone operators in Hilbert space. J Math. Anal. Appl. 72(2), 383–390 (1979)

    Article  MathSciNet  MATH  Google Scholar 

  45. Polyak, B. T.: Introduction to Optimization. Optimization Software Incorporation, New York (1987)

    MATH  Google Scholar 

  46. Reed, R.: Pruning algorithms - a survey. IEEE Trans. Neural Netw. 4(5), 740–747 (1993)

    Article  Google Scholar 

  47. Robinson, S.: Stability theory for systems of inequalities. Part i: linear systems. SIAM J. Numer. Anal. 12(5), 754–769 (1975)

    Article  MathSciNet  MATH  Google Scholar 

  48. Robinson, S.: Strongly regular generalized equations. Math. Oper. Res. 5(1), 43–62 (1980)

    Article  MathSciNet  MATH  Google Scholar 

  49. Robinson, S.: Some continuity properties of polyhedral multifunctions. Math. Program. Stud. 14, 206–214 (1981)

    Article  MathSciNet  MATH  Google Scholar 

  50. Rockafellar, R. T., Wets, R.: Variational Analysis. Springer Science & Business Media (2009)

  51. Schmidt, M., Roux, N., Bach, F.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Advances in Neural Information Processing Systems, vol. 24, pp 1458–1466 (2011)

  52. Tao, S., Boley, D., Zhang, S.: Local linear convergence of ISTA and FISTA on the LASSO problem. SIAM J. Optim. 26(1), 313–336 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  53. Tseng, P.: Approximation accuracy, gradient methods, and error bound for structured convex optimization. Math. Program. 125(2), 263–295 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  54. Tseng, P., Yun, S.: A coordinate gradient descent method for nonsmooth separable minimization. Math. Program. 117(1), 387–423 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  55. Wen, B., Chen, X., Pong, T. -K.: Linear convergence of proximal gradient algorithm with extrapolation for a class of nonconvex nonsmooth minimization problems. SIAM J. Optim. 27(1), 124–145 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  56. Xiao, L., Zhang, T.: A proximal-gradient homotopy method for the sparse least-squares problem. SIAM J. Optim. 23(2), 1062–1091 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  57. Yang, W., Han, D.: Linear convergence of the alternating direction method of multipliers for a class of convex optimization problems. SIAM J. Numer. Anal. 54, 625–640 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  58. Ye, J. J.: Constraint qualifications and necessary optimality conditions for optimization problems with variational inequality constraints. SIAM J. Optim. 10(4), 943–962 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  59. Ye, J. J., Ye, X. Y.: Necessary optimality conditions for optimization problems with variational inequality constraints. Math. Oper. Res. 22(4), 977–997 (1997)

    Article  MathSciNet  MATH  Google Scholar 

  60. Ye, J. J., Yuan, X., Zeng, S., Zhang, J.: Variational analysis perspective on linear convergence of some first order methods for nonsmooth convex optimization problems. Optimization-online preprint (2018)

  61. Ye, J. J., Zhou, J.: Verifiable sufficient conditions for the error bound property of second-order cone complementarity problems. Math. Program. 171, 361–395 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  62. Yuan, X., Zeng, S., Zhang, J.: Discerning the linear convergence of ADMM for structured convex optimization through the lens of variational analysis. J. Mach. Learn. Res. 21, 1–75 (2020)

    MathSciNet  MATH  Google Scholar 

  63. Zhang, C. -H.: Nearly unbiased variable selection under minimax concave penalty. Ann. Stat. 38(2), 894–942 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  64. Zhou, Z., So, A. M. -C.: A unified approach to error bounds for structured convex optimization problems. Math. Program. 165(2), 689–728 (2017)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgments

The authors would like to thank the anonymous referees for their helpful suggestions and comments.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jin Zhang.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The research was partially supported by NSFC No. 11871279 and 11971090, NSERC, the General Research Fund from Hong Kong Research Grants Council: 12302318, National Science Foundation of China 11971220, Guangdong Basic and Applied Basic Research Foundation 2019A1515011152

Appendix A

Appendix A

1.1 A.1 Proof of Lemma 2

(1) Since xk+ 1 is the optimal solution of the proximal operation (1.3) with a = xkγf(xk), we have

$$ \begin{array}{@{}rcl@{}} && g(x^{k+1}) + \frac{1}{2\gamma}\left\| x^{k+1} - \left( x^{k} - \gamma \nabla f(x^{k}) \right) \right\|^{2} \le g(x^{k}) + \frac{1}{2\gamma}\left\| \gamma \nabla f(x^{k}) \right\|^{2}, \end{array} $$

which can be reformulated as

$$ g(x^{k+1}) + \frac{1}{2\gamma}\left\| x^{k+1} - x^{k} \right\|^{2} + \left\langle \nabla f(x^{k}), x^{k+1} - x^{k} \right\rangle - g(x^{k}) \le 0. $$
(A.1)

Furthermore, since ∇f(x) is globally Lipschitz continuous with the Lipschitz constant L, we have

$$ f(x^{k+1})\le f(x^{k}) + \left\langle \nabla f(x^{k}), x^{k+1} - x^{k} \right\rangle + \frac{L}{2}\left\| x^{k+1} - x^{k} \right\|^{2}. $$

Adding the above inequality to (A.1) we obtain

$$ F(x^{k+1}) - F(x^{k}) \le \left( \frac{L}{2} - \frac{1}{2\gamma} \right) \left\| x^{k+1} - x^{k} \right\|^{2}. $$

As a result if \(\gamma < \frac {1}{L}\) we have (3.1) with \(\kappa _{1} := \frac {1}{2\gamma } - \frac {L}{2}\).

(2) By the optimality of xk+ 1 we have that for any x,

$$ g(x^{k+1}) + \frac{1}{2\gamma}\left\| x^{k+1} - x^{k} + \gamma \nabla f(x^{k}) \right\|^{2} \le g(x) + \frac{1}{2\gamma}\left\| x - x^{k} + \gamma \nabla f(x^{k}) \right\|^{2}, $$

which can be reformulated as

$$ g(x^{k+1}) - g(x) \le \frac{1}{2\gamma}\left\| x - x^{k} \right\|^{2} - \frac{1}{2\gamma}\left\| x^{k+1} - x^{k} \right\|^{2} + \left\langle \nabla f(x^{k}), x - x^{k+1} \right\rangle. $$

By the Lipschitz continuity of ∇f(x),

$$ f(x) \ge f(x^{k+1}) + \left\langle \nabla f(x^{k+1}), x - x^{k+1} \right\rangle - \frac{L}{2}\left\| x - x^{k+1} \right\|^{2}. $$

By the above two inequalities we obtain

$$ \begin{array}{@{}rcl@{}} F(x^{k+1}) - F(x)&\le&\frac{1}{2\gamma}\left\| x - x^{k} \right\|^{2} - \frac{1}{2\gamma}\left\| x^{k+1} - x^{k} \right\|^{2} + \left\langle \nabla f(x^{k}), x - x^{k+1} \right\rangle \\ && - \left\langle \nabla f(x^{k+1}), x - x^{k+1} \right\rangle + \frac{L}{2} \left\| x - x^{k+1} \right\|^{2}\\ &\le&\frac{1}{\gamma}\left\| x - x^{k+1}\right\|^{2} + \frac{1}{\gamma} \left\| x^{k+1} - x^{k} \right\|^{2} - \frac{1}{2\gamma}\left\| x^{k+1} - x^{k} \right\|^{2} \\ && +\left\langle \nabla f(x^{k}) - \nabla f(x^{k+1}), x - x^{k+1} \right\rangle + \frac{L}{2} \left\| x - x^{k+1} \right\|^{2}\\ &\le&\frac{1}{\gamma}\left\| x - x^{k+1}\right\|^{2}+ \frac{1}{\gamma} \left\| x^{k+1} - x^{k} \right\|^{2}-\frac{1}{2\gamma} \left\| x^{k+1} - x^{k} \right\|^{2} \\ && + \frac{L}{2} \left\| x^{k+1} - x^{k} \right\|^{2} + \frac{1}{2} \left\| x - x^{k+1} \right\|^{2} + \frac{L}{2} \left\| x - x^{k+1} \right\|^{2}\\ &=& \left( \frac{1}{\gamma} + \frac{L+1}{2} \right) \left\| x - x^{k+1} \right\|^{2} + \left( \frac{L}{2} + \frac{1}{2\gamma} \right) \left\| x^{k} - x^{k+1} \right\|^{2}, \end{array} $$

from which we can obtain (3.2) with \(\kappa _{2} := \max \limits \left \{ \left (\frac {1}{\gamma } + \frac {L+1}{2} \right ), \left (\frac {L}{2} + \frac {1}{2\gamma } \right ) \right \}\).

1.2 A.2 Proof of Theorem 5

In the proof, we denote by \(\zeta :=F(\bar x)\) for succinctness. And we recall that the proper separation of the stationary value condition holds on \(\bar {x} \in {\mathcal {X}}^{\pi }\), i.e., there exists δ > 0 such that

$$ x \in {\mathcal{X}}^{\pi}\cap {\mathbb{B}} (\bar{x},\delta )\quad \Longrightarrow \quad F(x) = F(\bar{x}). $$
(A.2)

Without lost of generality, we assume that 𝜖 < δ/(κ + 1) throughout the proof.

Step 1. We prove that \(\bar x\) is a stationary point and

$$ \lim_{k\rightarrow \infty} \|x^{k+1} -x^{k}\|=0. $$
(A.3)

Adding the inequalities in (3.1) starting from iteration k = 0 to an arbitrary positive integer K, we obtain

$$ \sum\limits_{k=0}^{K} \left\| x^{k+1} - x^{k} \right\|^{2} \le \frac{1}{\kappa_{1}} \left( F(x^{0}) - F(x^{K+1}) \right)\le \frac{1}{\kappa_{1}} \left( F(x^{0}) - F_{\min} \right) < \infty. $$

It follows that \({\sum }_{k=0}^{\infty } \left \| x^{k+1} - x^{k} \right \|^{2} < \infty ,\) and consequently (A.3) holds. Let \(\{ x^{k_{i}} \}_{i=1}^{\infty }\) be a convergent subsequence of \(\left \{ x^{k} \right \}\) such that \(x^{k_{i}}\rightarrow \bar {x}\) as \(i\rightarrow \infty \). Then by (A.3), we have

$$ \lim_{i\rightarrow \infty} x^{k_{i}}=\lim_{i\rightarrow \infty} x^{k_{i}-1}=\bar x. $$
(A.4)

Since

$$ x^{k_{i}} \in\text{Prox}_{g}^{\gamma} \left (x^{k_{i}-1} -\gamma \nabla f(x^{k_{i}-1}) \right ), $$
(A.5)

let \(i\rightarrow \infty \) in (A.5) and by the outer semicontinuity of \(\text {Prox}_{g}^{\gamma } (\cdot )\) (see [50, Theorem 1.25]) and continuity of ∇f, we have

$$ \bar{x}\in \text{Prox}_{g}^{\gamma} \left( \bar{x} -\gamma \nabla f(\bar{x}) \right ), $$

Using the definition of the proximal operator and applying the optimality condition and we have

$$ 0\in \nabla f \left( \bar{x}\right) + \partial^{\pi} g \left( \bar{x}\right) , $$

and so \(\bar x\in \mathcal {X}^{\pi }\).

Step 2. Given \(\hat {\epsilon } > 0\) such that \(\hat {\epsilon } < \delta /\epsilon - \kappa -1\), for each k > 0, we can find \({{\bar {x}^{k} \in \mathcal {X}^{\pi }}}\) such that

$$\left\| \bar{x}^{k} - x^{k} \right\| \le \min\left\{\sqrt{d\left( x^{k},{{{\mathcal{X}}^{\pi}}} \right)^{2} + \hat{\epsilon}\|x^{k} - x^{k-1}\|^{2}}, d\left( x^{k}, {{{\mathcal{X}}^{\pi}}} \right) + \hat{\epsilon}\|x^{k} - x^{k-1}\|\right\}.$$

It follows by the cost-to-estimate condition (3.2) we have

$$ F(x^{k}) - F(\bar{x}^{k})\le \hat{ \kappa}_{2} \left( \text{dist} \left( x^{k}, {\mathcal{X}}^{\pi} \right)^{2} + \left\| x^{k} - x^{k-1} \right\|^{2} \right), $$
(A.6)

with \(\hat { \kappa }_{2} = \kappa _{2}(1+\hat {\epsilon })\). Now we use the method of mathematical induction to prove that there exists k > 0 such that for all jk,

$$ \begin{array}{@{}rcl@{}} && x^{j}\in \mathbb{B} \left( \bar{x}, \epsilon \right),\quad x^{j+1}\in \mathbb{B} \left( \bar{x}, \epsilon \right),\quad F(\bar{x}^{j}) = \zeta ,\quad {{F(\bar{x}^{j+1}) = \zeta}}, \end{array} $$
(A.7)
$$ \begin{array}{@{}rcl@{}} && F(x^{j+1}) - \zeta \le \hat{ \kappa}_{2} \left( \text{dist} \left( x^{j+1}, {\mathcal{X}}^{\pi} \right)^{2} + \left\| x^{j+1} - x^{j} \right\|^{2} \right), \end{array} $$
(A.8)
$$ \sum\limits_{i=k_{\ell}}^{j} \left\| x^{i} - x^{i+1} \right\| \le \frac{\left\| x^{k_{\ell}-1} - x^{k_{\ell}} \right\| - \left\| x^{j} - x^{j+1} \right\|}{2} + c\left[\sqrt{F(x^{k_{\ell}}) - \zeta} - \sqrt{F(x^{j+1}) - \zeta}\right], $$
(A.9)

where the constant \(c:=\frac {2\sqrt {\hat { \kappa }_{2}(\kappa ^{2}+1)}}{\kappa _{1}}> 0\).

By (A.4) and the fact that F is continuous in its domain, there exists k > 0 such that \(x^{k_{\ell }}\in \mathbb {B} \left (\bar {x},\epsilon \right )\), \(x^{k_{\ell }+1}\in \mathbb {B} \left (\bar {x},\epsilon \right )\),

$$ \begin{array}{@{}rcl@{}} &&{\left\| x^{k_{\ell}} - \bar{x} \right\| + \frac{\left\| x^{k_{\ell}-1} - x^{k_{\ell}} \right\| }{2} + c\left[\sqrt{F(x^{k_{\ell}}) - \zeta}\right]\le \frac{\epsilon}{2}}, \end{array} $$
(A.10)
$$ \begin{array}{@{}rcl@{}} &&{\left\| x^{k+1} - x^{k} \right\| < \frac{\epsilon}{2} , \quad \forall k \ge k_{\ell} - 1}, \end{array} $$
(A.11)
$$ \begin{aligned} \left\|\bar{x}^{k_{\ell}} - \bar{x}\right\| &\le \left\|\bar{x}^{k_{\ell}} - x^{k_{\ell}}\right\| + \left\|x^{k_{\ell}} - \bar{x}\right\| \\ &\overset{{(3.4)}}{\le} (\kappa+\hat{\epsilon}) \left\|x^{k_{\ell}} - x^{k_{\ell} - 1}\right\| + \left\|x^{k_{\ell}} - \bar{x}\right\| < {(\kappa +\hat{\epsilon} + 2)\epsilon/2} < \delta, \end{aligned} $$

which indicates \(\bar {x}^{k_{\ell }}\in {{{\mathcal {X}}^{\pi }}} \cap {\mathbb {B}}\left (\bar {x},\delta \right )\). It follows by the proper separation of the stationary value condition (A.2) that \(F\left (\bar {x}^{k_{\ell }} \right ) = \zeta \).

Before inducing (A.7)–(A.9), we should get ready by showing that for jk, if (A.7) and (A.8) hold, then

$$ 2\left\| x^{j} - x^{j+1} \right\| \le c\left[\sqrt{F(x^{j}) - \zeta} - \sqrt{F(x^{j+1}) - \zeta}\right] + \frac{\left\| x^{j} - x^{j+1} \right\| + \left\| x^{j-1} - x^{j} \right\|}{2}. $$
(A.12)

Firstly, since \(x^{j}\in \mathbb {B} \left (\bar {x}, \epsilon \right )\), \(F(\bar {x}^{j}) = \zeta \) and (A.6) holds, it follows from (3.4) that

$$ F(x^{j})-\zeta\leq \hat{ \kappa}_{2}(\kappa \|x^{j}-x^{j-1}\|^{2}+ \|x^{j}-x^{j-1}\|^{2}) ={\kappa_{3}^{2}}\|x^{j}-x^{j-1}\|^{2}, $$
(A.13)

where \(\kappa _{3} := \sqrt {\hat { \kappa }_{2} \left (\kappa ^{2} + 1 \right )}\). Similarly, since \(x^{j+1}\in \mathbb {B} \left (\bar {x}, \epsilon \right )\) and \(F(\bar {x}^{j+1}) = \zeta \), by (A.6) and condition (3.4), we have

$$ \begin{array}{@{}rcl@{}} F(x^{j+1}) - \zeta &\le&{\kappa_{3}^{2}} \left\| x^{j+1} - x^{j} \right\|^{2}. \end{array} $$
(A.14)

As a result, we can obtain

$$ \begin{array}{@{}rcl@{}} \sqrt{F(x^{j}) - \zeta} - \sqrt{F(x^{j+1}) - \zeta}&=&\frac{\left( F(x^{j}) - \zeta\right) - \left( F(x^{j+1}) - \zeta\right)}{\sqrt{F(x^{j}) - \zeta} + \sqrt{F(x^{j+1}) - \zeta}}\\ &=& \frac{F(x^{j}) -F(x^{j+1})}{\sqrt{F(x^{j}) - \zeta} + \sqrt{F(x^{j+1}) - \zeta}}\\ &\overset{\text{(3.1)(A.46)(A.47)}}{\ge} & \frac{\kappa_{1} \left\| x^{j+1} - x^{j} \right\|^{2}}{\kappa_{3}\left( \left\| x^{j} - x^{j-1} \right\| + \left\| x^{j+1} - x^{j} \right\| \right)}. \end{array} $$

After defining \(c: = \frac {2\kappa _{3}}{\kappa _{1}}\), we have

$$ \left( c \left[\sqrt{F(x^{j}) - \zeta} - \sqrt{F(x^{j+1}) - \zeta}\right] \right)\left( \frac{\left\| x^{j} - x^{j+1} \right\| + \left\| x^{j-1} - x^{j} \right\|}{2} \right)\ge \left\| x^{j+1} - x^{j} \right\|^{2}, $$

from which by applying \(ab\le \left (\frac {a+b}{2} \right )^{2}\) we establish (A.12).

Next we proceed to prove the three properties (A.7)–(A.9) by induction on j. For j = k, we have

$$ x^{k_{\ell}}\in \mathbb{B} \left( \bar{x}, \epsilon \right),\quad x^{k_{\ell}+1}\in \mathbb{B} \left( \bar{x}, \epsilon \right),\quad F(\bar{x}^{k_{\ell}}) = \zeta, $$

and similar to the estimate of \(\left \|\bar {x}^{k_{\ell }} - \bar {x}\right \|\), we can show

$$ \left\| \bar{x}^{k_{\ell}+1} - \bar{x} \right\| \le \delta. $$

It follows by (A.2) that \(F(\bar {x}^{k_{\ell }+1}) = \zeta \), and hence by (A.6),

$$ F(x^{k_{\ell}+1}) - \zeta \le \hat{ \kappa}_{2} \left( \text{dist} \left( x^{k_{\ell}+1}, {{{\mathcal{X}}^{\pi}}} \right)^{2} + \left\| x^{k_{\ell}+1} - x^{k_{\ell}} \right\|^{2} \right), $$

which is (A.8) with j = k. Note that property (A.9) for j = k can be obtained directly through (A.12).

Now suppose (A.7) (A.8) and (A.9) hold for certain j > k. By induction we also want to show that (A.7) (A.8) and (A.9) hold for j + 1. We have

$$ \begin{array}{@{}rcl@{}} \left\| x^{j+2} - \bar{x} \right\|&\le& \left\| x^{k_{\ell}} - \bar{x} \right\| + {\sum}_{i={k_{\ell}}}^{j} \left\| x^{i} - x^{i+1} \right\| + \left\| x^{j+1} - x^{j+2} \right\| \\ &<& \left\| x^{k_{\ell}} - \bar{x} \right\| + \frac{\left\| x^{k_{\ell}-1} - x^{k_{\ell}} \right\| - \left\| x^{j} - x^{j+1} \right\|}{2}\\ && + c\left[\sqrt{F(x^{k_{\ell}}) - \zeta} - \sqrt{F(x^{j+1}) - \zeta}\right] + \frac{\epsilon}{2} \\ &\le& \left\| x^{k_{\ell}} - \bar{x} \right\| + \frac{\left\| x^{k_{\ell}-1} - x^{k_{\ell}} \right\| }{2} + c\left[\sqrt{F(x^{k_{\ell}}) - \zeta}\right] + \frac{\epsilon}{2} \le \epsilon, \end{array} $$

where the second inequality follows from (A.9) and (A.11) and the last inequality follows from (A.10). Since \(x^{j+2}\in \mathbb {B}(\bar x, \epsilon )\), by the definition of \(\bar {x}^{j}\) and (3.4), there holds that

$$\begin{aligned} \left\| \bar{x}^{j+2} - \bar{x} \right\| &\le \| \bar{x}^{j+2} - x^{j+2} \| + \| x^{j+2} - \bar{x} \| \\ &\le (\kappa+\hat{\epsilon})\| x^{j+2} - x^{j+1} \| + \epsilon \\ & < (\kappa +\hat{\epsilon} +2)\epsilon/2 < \delta, \end{aligned}$$

where the third inequality follows from (A.11). It follows from the proper separation of stationary value assumption (A.2) that \(F(\bar {x}^{j+2}) = \zeta \). Consequently by (A.6), we have

$$ F(x^{j+2}) - \zeta \le \hat{ \kappa}_{2} \left( \text{dist} \left( x^{j+2}, {{{\mathcal{X}}^{\pi}}} \right)^{2} + \left\| x^{j+2} - x^{j+1} \right\|^{2} \right). $$

So far we have shown that (A.7)-(A.8) hold for j + 1. Moreover

$$ \begin{array}{@{}rcl@{}} && \sum\limits_{i=k_{\ell}}^{j+1} \left\| x^{i} - x^{i+1} \right\| \\ &\overset{\text{(A.42)}}{\le}& \frac{\left\| x^{k_{\ell}-1} - x^{k_{\ell}} \right\| - \left\| x^{j} - x^{j+1} \right\|}{2} + c\left[\sqrt{F(x^{k_{\ell}}) - \zeta} - \sqrt{F(x^{j+1}) - \zeta}\right] \\ && + \left\| x^{j+1} - x^{j+2} \right\|\\ &\overset{\text{(A.45)}}{\le}& \frac{\left\| x^{k_{\ell}-1} - x^{k_{\ell}} \right\| - \left\| x^{j} - x^{j+1} \right\|}{2} + c\left[\sqrt{F(x^{k_{\ell}}) - \zeta} - \sqrt{F(x^{j+1}) - \zeta}\right] \\ && + c\left[\sqrt{F(x^{j+1}) - \zeta} - \sqrt{F(x^{j+2}) - \zeta}\right] + \frac{\left\| x^{j+1} - x^{j+2} \right\| + \left\| x^{j} - x^{j+1} \right\|}{2}\\ && - \left\| x^{j+1} - x^{j+2} \right\|\\ &=& \frac{\left\| x^{k_{\ell}-1} - x^{k_{\ell}} \right\| - \left\| x^{j+1} - x^{j+2} \right\|}{2} + c\left[\sqrt{F(x^{k_{\ell}}) - \zeta} - \sqrt{F(x^{j+2}) - \zeta}\right], \end{array} $$

from which we obtain (A.9) for j + 1. The desired induction on j is now complete. In summary, we have now proved the properties (A.7)–(A.9).

Step 3. We prove that the whole sequence {xk} converges to \(\bar x\) and (3.5)–(3.6) hold.

By (A.9), for all jk

$$ \begin{aligned} \sum\limits_{i=k_{\ell}}^{j} \left\| x^{i} - x^{i+1} \right\| &\le \frac{\left\| x^{k_{\ell}-1} - x^{k_{\ell}} \right\| - \left\| x^{j} - x^{j+1} \right\|}{2} + c\left[\sqrt{F(x^{k_{\ell}}) - \zeta} - \sqrt{F(x^{j+1}) - \zeta}\right]\\ &\le \frac{\left\| x^{k_{\ell}-1} - x^{k_{\ell}} \right\|}{2} + c \sqrt{F(x^{k_{\ell}}) - \zeta} < \infty, \end{aligned} $$

which indicates that \(\left \{ x^{k} \right \}\) is a Cauchy sequence. It follows that the whole sequence converges to the stationary point \(\bar {x}\). Further for all kk, we have \(x^{k}\in \mathbb {B}(\bar {x},\epsilon )\). As a result, the PG-iteration-based error bound condition (3.4) holds on all the iteration points \(\left \{ x^{k} \right \}_{k > k_{\ell }}\). Recall that by (3.1) and (A.14), we have

$$ \begin{array}{@{}rcl@{}} F(x^{k+1}) - F(x^{k}) &\le & -\kappa_{1}\left\| x^{k+1} - x^{k} \right\|^{2}, \\ F(x^{k+1}) - \zeta &\le & \hat{ \kappa}_{2} \left( \kappa^{2} + 1 \right)\left\| x^{k+1} - x^{k} \right\|^{2} , \end{array} $$

which implies that

$$ F(x^{k}) - F(x^{k+1})\ge \frac{\kappa_{1}}{\hat{ \kappa}_{2}\left( \kappa^{2} + 1 \right)} \left( F(x^{k+1}) - \zeta\right). $$

We can observe easily that

$$ F(x^{k}) - \zeta + \zeta - F(x^{k+1})\ge \frac{\kappa_{1}}{\hat{ \kappa}_{2}\left( \kappa^{2} + 1 \right)} \left( F(x^{k+1}) - \zeta\right). $$

Thus we have

$$ F(x^{k+1}) - \zeta \le \sigma \left( F(x^{k}) - \zeta\right), \text{with } \sigma := \frac{1}{1 + \frac{\kappa_{1}}{\hat{ \kappa}_{2}\left( \kappa^{2} + 1 \right)}} < 1, $$

which completes the proof of (3.5).

Inspired by [6], we have following linear convergence result for sequence {xk}. Recall the sufficient descent property (3.1),

$$F(x^{k+1}) - F(x^{k}) \le -\kappa_{1}\left\| x^{k+1} - x^{k} \right\|^{2},$$

which indicates that there exists a constant C such that

$$ \left\| x^{k+1} - x^{k} \right\| \le \sqrt{ \frac{1}{\kappa_{1}} \left( F(x^{k}) - F(x^{k+1}) \right) }\le \sqrt{ \frac{1}{\kappa_{1}} \left( F(x^{k}) - \zeta \right) }\le C \sqrt{\sigma}^{k}. $$

In addition, we have that

$$ \left\| x^{k} - \bar{x} \right\| \le {\sum}_{i=k}^{\infty} \left\| x^{i} - x^{i+1} \right\|\le {\sum}_{i=k}^{\infty} C\sqrt{\sigma}^{i} \le \frac{C}{1-\sqrt{\sigma}} \sqrt{\sigma}^{k}, $$

which implies (3.6) with \(\rho _{0} = \frac {C}{1-\sqrt {\sigma }}\).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, X., Ye, J.J., Yuan, X. et al. Perturbation Techniques for Convergence Analysis of Proximal Gradient Method and Other First-Order Algorithms via Variational Analysis. Set-Valued Var. Anal 30, 39–79 (2022). https://doi.org/10.1007/s11228-020-00570-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11228-020-00570-0

Keywords

Mathematics Subject Classification (2010)

Navigation