Skip to main content
Log in

Block-coordinate and incremental aggregated proximal gradient methods for nonsmooth nonconvex problems

  • Full Length Paper
  • Series A
  • Published:
Mathematical Programming Submit manuscript

Abstract

This paper analyzes block-coordinate proximal gradient methods for minimizing the sum of a separable smooth function and a (nonseparable) nonsmooth function, both of which are allowed to be nonconvex. The main tool in our analysis is the forward-backward envelope, which serves as a particularly suitable continuous and real-valued Lyapunov function. Global and linear convergence results are established when the cost function satisfies the Kurdyka–Łojasiewicz property without imposing convexity requirements on the smooth function. Two prominent special cases of the investigated setting are regularized finite sum minimization and the sharing problem; in particular, an immediate byproduct of our analysis leads to novel convergence results and rates for the popular Finito/MISO algorithm in the nonsmooth and nonconvex setting with very general sampling strategies.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Attouch, H., Bolte, J.: On the convergence of the proximal algorithm for nonsmooth functions involving analytic features. Math. Program. 116(1–2), 5–16 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  2. Attouch, H., Bolte, J., Redont, P., Soubeyran, A.: Proximal alternating minimization and projection methods for nonconvex problems: an approach based on the Kurdyka–Łojasiewicz inequality. Math. Oper. Res. 35(2), 438–457 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  3. Attouch, H., Bolte, J., Svaiter, B.F.: Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward-backward splitting, and regularized Gauss-Seidel methods. Math. Program. 137(1), 91–129 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  4. Bauschke, H.H., Combettes, P.L.: Convex Analysis and Monotone Operator Theory in Hilbert Spaces. CMS Books in Mathematics. Springer, Berlin (2017)

    Book  MATH  Google Scholar 

  5. Beck, A.: First-Order Methods in Optimization. Society for Industrial and Applied Mathematics, Philadelphia, PA (2017)

    Book  MATH  Google Scholar 

  6. Beck, A., Tetruashvili, L.: On the convergence of block coordinate descent type methods. SIAM J. Optim. 23(4), 2037–2060 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  7. Bertsekas, D.P.: Incremental proximal methods for large scale convex optimization. Math. Program. 129(2), 163–195 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  8. Bertsekas, D.P.: Nonlinear Programming. Athena Scientific, Singapore (2016)

    MATH  Google Scholar 

  9. Bertsekas, D.P., Tsitsiklis, J.N.: Parallel and Distributed Computation: Numerical Methods, vol. 23. Prentice-Hall, Upper Saddle River (1989)

    MATH  Google Scholar 

  10. Bianchi, P., Hachem, W., Iutzeler, F.: A coordinate descent primal-dual algorithm and application to distributed asynchronous optimization. IEEE Trans. Autom. Control 61(10), 2947–2957 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  11. Bolte, J., Daniilidis, A., Lewis, A.: The Łojasiewicz inequality for nonsmooth subanalytic functions with applications to subgradient dynamical systems. SIAM J. Optim. 17(4), 1205–1223 (2007)

    Article  MATH  Google Scholar 

  12. Bolte, J., Daniilidis, A., Lewis, A., Shiota, M.: Clarke subgradients of stratifiable functions. SIAM J. Optim. 18(2), 556–572 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  13. Bolte, J., Sabach, S., Teboulle, M.: Proximal alternating linearized minimization for nonconvex and nonsmooth problems. Math. Program. 146(1–2), 459–494 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  14. Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 3(1), 1–122 (2011)

    Article  MATH  Google Scholar 

  15. Chouzenoux, E., Pesquet, J.C., Repetti, A.: A block coordinate variable metric forward-backward algorithm. J. Glob. Optim. 66(3), 457–485 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  16. Chow, Y.T., Wu, T., Yin, W.: Cyclic coordinate-update algorithms for fixed-point problems: analysis and applications. SIAM J. Sci. Comput. 39(4), A1280–A1300 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  17. Clarke, F.H.: Optimization and nonsmooth analysis. Soc. Ind. Appl. Math. (1990)

  18. Combettes, P.L., Pesquet, J.C.: Stochastic quasi-Fejér block-coordinate fixed point iterations with random sweeping. SIAM J. Optim. 25(2), 1221–1248 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  19. Davis, D.: Smart: The stochastic monotone aggregated root-finding algorithm. arXiv:1601.00698 (2016)

  20. Defazio, A., Bach, F., Lacoste-Julien, S.: SAGA: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information Processing Systems, pp. 1646–1654 (2014)

  21. Defazio, A., Domke, J.: Finito: A faster, permutable incremental gradient method for big data problems. In: International Conference on Machine Learning, pp. 1125–1133 (2014)

  22. Facchinei, F., Pang, J.S.: Finite-Dimensional Variational Inequalities and Complementarity Problems, vol. II. Springer, Berlin (2003)

    MATH  Google Scholar 

  23. Fercoq, O., Bianchi, P.: A coordinate-descent primal-dual algorithm with large step size and possibly nonseparable functions. SIAM J. Optim. 29(1), 100–134 (2019)

    Article  MathSciNet  MATH  Google Scholar 

  24. Frankel, P., Garrigos, G., Peypouquet, J.: Splitting methods with variable metric for Kurdyka–Łojasiewicz functions and general convergence rates. J. Optim. Theory Appl. 165(3), 874–900 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  25. Fukushima, M., Mine, H.: A generalized proximal point algorithm for certain non-convex minimization problems. Int. J. Syst. Sci. 12(8), 989–1000 (1981)

    Article  MATH  Google Scholar 

  26. Hanzely, F., Mishchenko, K., Richtárik, P.: SEGA: Variance reduction via gradient sketching. In: Advances in Neural Information Processing Systems, pp. 2082–2093 (2018)

  27. Hong, M., Wang, X., Razaviyayn, M., Luo, Z.Q.: Iteration complexity analysis of block coordinate descent methods. Math. Program. 163(1–2), 85–114 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  28. Hou, Y., Song, I., Min, H.K., Park, C.H.: Complexity-reduced scheme for feature extraction with linear discriminant analysis. IEEE Trans. Neural Netw. Learn. Syst. 23(6), 1003–1009 (2012)

    Article  Google Scholar 

  29. Iutzeler, F., Bianchi, P., Ciblat, P., Hachem, W.: Asynchronous distributed optimization using a randomized alternating direction method of multipliers. In: 52nd IEEE Conference on Decision and Control (CDC), pp. 3671–3676 (2013)

  30. Kurdyka, K.: On gradients of functions definable in \(o\)-minimal structures. Annales de l’institut Fourier 48(3), 769–783 (1998)

    Article  MathSciNet  MATH  Google Scholar 

  31. Latafat, P., Freris, N.M., Patrinos, P.: A new randomized block-coordinate primal-dual proximal algorithm for distributed optimization. IEEE Trans. Autom. Control 64(10), 4050–4065 (2019)

    Article  MathSciNet  MATH  Google Scholar 

  32. Li, G., Pong, T.K.: Douglas–Rachford splitting for nonconvex optimization with application to nonconvex feasibility problems. Math. Program. 159(1), 371–401 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  33. Lin, Q., Lu, Z., Xiao, L.: An accelerated randomized proximal coordinate gradient method and its application to regularized empirical risk minimization. SIAM J. Optim. 25(4), 2244–2273 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  34. Łojasiewicz, S.: Une propriété topologique des sous-ensembles analytiques réels. Les équations aux dérivées partielles pp. 87–89 (1963)

  35. Łojasiewicz, S.: Sur la géométrie semi- et sous- analytique. Annales de l’institut Fourier 43(5), 1575–1595 (1993)

    Article  MathSciNet  MATH  Google Scholar 

  36. Mairal, J.: Incremental majorization-minimization optimization with application to large-scale machine learning. SIAM J. Optim. 25(2), 829–855 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  37. Mokhtari, A., Gürbüzbalaban, M., Ribeiro, A.: Surpassing gradient descent provably: a cyclic incremental method with linear convergence rate. SIAM J. Optim. 28(2), 1420–1447 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  38. Necoara, I.: Random coordinate descent algorithms for multi-agent convex optimization over networks. IEEE Trans. Autom. Control 58(8), 2001–2012 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  39. Necoara, I., Patrascu, A.: A random coordinate descent algorithm for optimization problems with composite objective function and linear coupled constraints. Comput. Optim. Appl. 57(2), 307–337 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  40. Nesterov, Y.: Efficiency of coordinate descent methods on huge-scale optimization problems. SIAM J. Optim. 22(2), 341–362 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  41. Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course, vol. 87. Springer, Springer (2013)

    MATH  Google Scholar 

  42. Ochs, P., Chen, Y., Brox, T., Pock, T.: iPiano: inertial proximal algorithm for nonconvex optimization. SIAM J. Imaging Sci. 7(2), 1388–1419 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  43. Patrinos, P., Bemporad, A.: Proximal Newton methods for convex composite optimization. In: 52nd IEEE Conference on Decision and Control, pp. 2358–2363 (2013)

  44. Peng, Z., Xu, Y., Yan, M., Yin, W.: ARock: an algorithmic framework for asynchronous parallel coordinate updates. SIAM J. Sci. Comput. 38(5), A2851–A2879 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  45. Pesquet, J.C., Repetti, A.: A class of randomized primal-dual algorithms for distributed optimization. J. Nonlinear Convex Anal. 16(12), 2453–2490 (2015)

    MathSciNet  MATH  Google Scholar 

  46. Qian, X., Sailanbayev, A., Mishchenko, K., Richtárik, P.: MISO is making a comeback with better proofs and rates. arXiv:1906.01474 (2019)

  47. Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.J.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016)

  48. Reddi, S.J., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. In: Advances in Neural Information Processing Systems, pp. 1145–1153 (2016)

  49. Richtárik, P., Takáč, M.: Iteration complexity of randomized block-coordinate descent methods for minimizing a composite function. Math. Program. 144(1–2), 1–38 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  50. Robbins, H., Siegmund, D.: A convergence theorem for non negative almost supermartingales and some applications. In: Herbert Robbins Selected Papers, pp. 111–135. Springer (1985)

  51. Rockafellar, R.T., Wets, R.J.B.: Variational Analysis, vol. 317. Springer, Berlin (2011)

    MATH  Google Scholar 

  52. Schmidt, M., Le Roux, N., Bach, F.: Minimizing finite sums with the stochastic average gradient. Math. Program. 162(1), 83–112 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  53. Shalev-Shwartz, S., Zhang, T.: Stochastic dual coordinate ascent methods for regularized loss minimization. J. Mach. Learn. Res. 14(Feb), 567–599 (2013)

    MathSciNet  MATH  Google Scholar 

  54. Themelis, A.: Proximal algorithms for structured nonconvex optimization. Ph.D. thesis, KU Leuven (2018)

  55. Themelis, A., Ahookhosh, M., Patrinos, P.: On the acceleration of forward-backward splitting via an inexact Newton method. In: Bauschke, H.H., Burachik, R.S., Luke, D.R. (eds.) Splitting Algorithms, Modern Operator Theory, and Applications, pp. 363–412. Springer, Cham (2019)

    Chapter  Google Scholar 

  56. Themelis, A., Stella, L., Patrinos, P.: Forward-backward envelope for the sum of two nonconvex functions: further properties and nonmonotone linesearch algorithms. SIAM J. Optim. 28(3), 2274–2303 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  57. Tseng, P.: Convergence of a block coordinate descent method for nondifferentiable minimization. J. Optim. Theory Appl. 109(3), 475–494 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  58. Tseng, P., Bertsekas, D.P.: Relaxation methods for problems with strictly convex separable costs and linear constraints. Math. Program. 38(3), 303–321 (1987)

    Article  MathSciNet  MATH  Google Scholar 

  59. Tseng, P., Yun, S.: Block-coordinate gradient descent method for linearly constrained nonsmooth separable optimization. J. Optim. Theory Appl. 140(3), 513–535 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  60. Tseng, P., Yun, S.: A coordinate gradient descent method for nonsmooth separable minimization. Math. Program. 117(1), 387–423 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  61. Tseng, P., Yun, S.: A coordinate gradient descent method for linearly constrained smooth optimization and support vector machines training. Comput. Optim. Appl. 47(2), 179–206 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  62. Xu, Y., Yin, W.: A block coordinate descent method for regularized multiconvex optimization with applications to nonnegative tensor factorization and completion. SIAM J. Imaging Sci. 6(3), 1758–1789 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  63. Xu, Y., Yin, W.: A globally convergent algorithm for nonconvex optimization based on block coordinate update. J. Sci. Comput. 72(2), 700–734 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  64. Yu, P., Li, G., Pong, T.K.: Deducing Kurdyka-Łojasiewicz exponent via inf-projection. arXiv:1902.03635 (2019)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Puya Latafat.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This work was supported by the Research Foundation Flanders (FWO) PhD grant 1196820N and research projects G0A0920N, G086518N and G086318N; Research Council KU Leuven C1 Project No. C14/18/068; Fonds de la Recherche Scientifique—FNRS and the Fonds Wetenschappelijk Onderzoek—Vlaanderen under EOS Project No. 30468160 (SeLMA).

A The key tool: the forward-backward envelope

A The key tool: the forward-backward envelope

This appendix contains some proofs and auxiliary results omitted in the main body. We begin by observing that, since \(F\) and \(-F\) are 1-smooth in the metric induced by \( \varLambda _F{:}{=}\tfrac{1}{N}{{\,\mathrm{blkdiag}\,}}(L_{f_1}\mathrm{I}_{n_1},\dots ,L_{f_N}\mathrm{I}_{n_N}) \), one has

$$\begin{aligned} F({\varvec{x}})+\langle {}\nabla F({\varvec{x}}){},{}{\varvec{w}}-{\varvec{x}}{}\rangle - \tfrac{1}{2}\Vert {\varvec{w}}-{\varvec{x}}\Vert _{\varLambda _F}^2 \le F({\varvec{w}}) \le F({\varvec{x}})+\langle {}\nabla F({\varvec{x}}){},{}{\varvec{w}}-{\varvec{x}}{}\rangle + \tfrac{1}{2}\Vert {\varvec{w}}-{\varvec{x}}\Vert _{\varLambda _F}^2\nonumber \\ \end{aligned}$$
(A.1)

for all \({\varvec{x}},{\varvec{w}}\in \mathbb {R}^{\sum _in_i}\), see [8, Prop. A.24]. Let us denote

$$\begin{aligned} {\mathcal {M}}_{\varGamma }({\varvec{w}},{\varvec{x}}) {:}{=}F({\varvec{x}})+\langle {}\nabla F({\varvec{x}}){},{}{\varvec{w}}-{\varvec{x}}{}\rangle + G({\varvec{w}}) + \tfrac{1}{2}\Vert {\varvec{w}}-{\varvec{x}}\Vert _{\varGamma ^{-1}}^2 \end{aligned}$$

the quantity being minimized (with respect to \({\varvec{w}}\)) in the definition (2.2a) of the FBE. It follows from (A.1) that

$$\begin{aligned} \varPhi ({\varvec{w}}) + \tfrac{1}{2}\Vert {\varvec{w}}-{\varvec{x}}\Vert ^2_{\varGamma ^{-1}-\varLambda _F} \le {\mathcal {M}}_{\varGamma }({\varvec{w}},{\varvec{x}}) \le \varPhi ({\varvec{w}}) + \tfrac{1}{2}\Vert {\varvec{w}}-{\varvec{x}}\Vert ^2_{\varGamma ^{-1}+\varLambda _F} \end{aligned}$$
(A.2)

holds for all \({\varvec{x}},{\varvec{w}}\in \mathbb {R}^{\sum _in_i}\). In particular, \({\mathcal {M}}_{\varGamma }\) is a majorizing model for \(\varPhi \), in the sense that \({\mathcal {M}}_{\varGamma }({\varvec{x}},{\varvec{x}})=\varPhi ({\varvec{x}})\) and \({\mathcal {M}}_{\varGamma }({\varvec{w}},{\varvec{x}})\ge \varPhi ({\varvec{w}})\) for all \({\varvec{x}},{\varvec{w}}\in \mathbb {R}^{\sum _in_i}\). In fact, as explained in Sect. 2.1, while a \(\varGamma \)-forward-backward step \({\varvec{z}}\in {\text {T}}_\varGamma ^{\textsc {fb}}({\varvec{x}})\) amounts to evaluating a minimizer of \({\mathcal {M}}_{\varGamma }(\cdot ,{\varvec{x}})\), the FBE is defined instead as the minimization value, namely \(\varPhi _\varGamma ^{\textsc {fb}}({\varvec{x}})={\mathcal {M}}_{\varGamma }({\varvec{z}},{\varvec{x}})\) where \({\varvec{z}}\) is any element of \({\text {T}}_\varGamma ^{\textsc {fb}}({\varvec{x}})\).

1.1 A.1 Proofs of Sect. 2.1

Proof of Lemma 2.1

For \({\varvec{x}}^\star \in {{\,\mathrm{\mathrm{arg\,min}}\,}}\varPhi \) it follows from (A.1) that

$$\begin{aligned} \min \varPhi \le F({\varvec{x}}) + G({\varvec{x}}) \le G({\varvec{x}}) + F({\varvec{x}}^\star ) + \langle {}\nabla F({\varvec{x}}^\star ){},{}{\varvec{x}}-{\varvec{x}}^\star {}\rangle + \tfrac{1}{2}\Vert {\varvec{x}}^\star -{\varvec{x}}\Vert _{\varLambda _F}^2. \end{aligned}$$

Therefore, \(G\) is lower bounded by a quadratic function with quadratic term \(-\tfrac{1}{2}\Vert \cdot \Vert _{\varLambda _F}^2\), and thus is prox-bounded in the sense of [51, Def. 1.23]. The claim then follows from [51, Th. 1.25 and Ex. 5.23(b)] and the continuity of the forward mapping \(\hbox {id} - \varGamma \nabla F\). \(\square \)

Proof of Lemma 2.3

(FBE: fundamental inequalities). Local Lipschitz continuity follows from (2.2d) in light of Lemma 2.1 and [51, Ex. 10.32].

  • 2.3(i) Follows by replacing \({\varvec{w}}={\varvec{x}}\) in (2.2a).

  • 2.3(ii) Directly follows from (A.2) and the identity \(\varPhi _\varGamma ^{\textsc {fb}}({\varvec{x}})={\mathcal {M}}_{\varGamma }({\varvec{z}},{\varvec{x}})\) for \({\varvec{z}}\in {\text {T}}_\varGamma ^{\textsc {fb}}({\varvec{x}})\). \(\square \)

Proof of Lemma 2.4

(FBE: minimization equivalence).

  • 2.4(i) and 2.4(ii) It follows from Lemma 2.3(i) that \(\inf \varPhi _\varGamma ^{\textsc {fb}}\le \min \varPhi \). Conversely, let \(({\varvec{x}}^k)_{{k\in \mathbb {N}}}\) be such that \(\varPhi _\varGamma ^{\textsc {fb}}({\varvec{x}}^k)\rightarrow \inf \varPhi _\varGamma ^{\textsc {fb}}\) as \(k\rightarrow \infty \), and for each \(k\) let \({\varvec{z}}^k\in {\text {T}}_\varGamma ^{\textsc {fb}}({\varvec{x}}^k)\). It then follows from Lemmas 2.3(i) and 2.3(ii) that

    $$\begin{aligned} \inf \varPhi _\varGamma ^{\textsc {fb}}\le \min \varPhi \le \liminf _{k\rightarrow \infty }\varPhi ({\varvec{z}}^k) \le \liminf _{k\rightarrow \infty }\varPhi _\varGamma ^{\textsc {fb}}({\varvec{x}}^k) = \inf \varPhi _\varGamma ^{\textsc {fb}}, \end{aligned}$$

    hence \(\min \varPhi =\inf \varPhi _\varGamma ^{\textsc {fb}}\). Suppose now that \({\varvec{x}}\in {{\,\mathrm{\mathrm{arg\,min}}\,}}\varPhi \) (which exists by Assumption I); then it follows from Lemma 2.3(ii) that \({\text {T}}_\varGamma ^{\textsc {fb}}({\varvec{x}})=\left\{ {\varvec{x}} \right\} \) (for otherwise another element would belong to a lower level set of \(\varPhi \)). Combining with Lemma 2.3(i) with \({\varvec{z}}={\varvec{x}}\) we then have

    $$\begin{aligned} \min \varPhi = \varPhi ({\varvec{z}}) \le \varPhi _\varGamma ^{\textsc {fb}}({\varvec{x}}) \le \varPhi ({\varvec{x}}) = \min \varPhi . \end{aligned}$$

    Since \(\min \varPhi =\inf \varPhi _\varGamma ^{\textsc {fb}}\), we conclude that \({\varvec{x}}\in {{\,\mathrm{\mathrm{arg\,min}}\,}}\varPhi _\varGamma ^{\textsc {fb}}\), and that in particular \(\inf \varPhi _\varGamma ^{\textsc {fb}}=\min \varPhi _\varGamma ^{\textsc {fb}}\). Conversely, suppose \({\varvec{x}}\in {{\,\mathrm{\mathrm{arg\,min}}\,}}\varPhi _\varGamma ^{\textsc {fb}}\) and let \({\varvec{z}}\in {\text {T}}_\varGamma ^{\textsc {fb}}({\varvec{x}})\). By combining Lemmas 2.3(i) and 2.3(ii) we have that \({\varvec{z}}={\varvec{x}}\), that is, that \({\text {T}}_\varGamma ^{\textsc {fb}}({\varvec{x}})=\left\{ {\varvec{x}} \right\} \). It then follows from Lemma 2.3(ii) and assertion 2.4(i) that

    $$\begin{aligned} \varPhi ({\varvec{x}}) = \varPhi ({\varvec{z}}) \le \varPhi _\varGamma ^{\textsc {fb}}({\varvec{x}}) = \min \varPhi _\varGamma ^{\textsc {fb}}= \min \varPhi , \end{aligned}$$

    hence \({\varvec{x}}\in {{\,\mathrm{\mathrm{arg\,min}}\,}}\varPhi \).

  • 2.4(iii) Due to Lemma 2.3(i), if \(\varPhi _\varGamma ^{\textsc {fb}}\) is level bounded clearly so is \(\varPhi \). Conversely, suppose that \(\varPhi _\varGamma ^{\textsc {fb}}\) is not level bounded. Then, there exist \(\alpha \in \mathbb {R}\) and \(({\varvec{x}}^k)_{{k\in \mathbb {N}}}\subseteq {{\,\mathrm{lev}\,}}_{\le \alpha }\varPhi _\varGamma ^{\textsc {fb}}\) such that \(\Vert {\varvec{x}}^k\Vert \rightarrow \infty \) as \(k\rightarrow \infty \). Let \(\lambda =\min _i\left\{ \gamma _i^{-1}-L_{f_i}N^{-1} \right\} >0\), and for each \(k\in \mathbb {N}\) let \({\varvec{z}}^k\in {\text {T}}_\varGamma ^{\textsc {fb}}({\varvec{x}}^k)\). It then follows from Lemma 2.3(ii) that

    $$\begin{aligned} \min \varPhi \le \varPhi ({\varvec{z}}^k) \le \varPhi _\varGamma ^{\textsc {fb}}({\varvec{x}}^k) - \tfrac{\lambda }{2}\Vert {\varvec{x}}^k-{\varvec{z}}^k\Vert ^2 \le \alpha - \tfrac{\lambda }{2}\Vert {\varvec{x}}^k-{\varvec{z}}^k\Vert ^2, \end{aligned}$$

    hence \(({\varvec{z}}^k)_{{k\in \mathbb {N}}}\subseteq {{\,\mathrm{lev}\,}}_{\le \alpha }\varPhi \) and \( \Vert {\varvec{x}}^k-{\varvec{z}}^k\Vert ^2 \le \tfrac{2}{\lambda }(\alpha -\min \varPhi ) \). Consequently, also the sequence \(({\varvec{z}}^k)_{{k\in \mathbb {N}}}\subseteq {{\,\mathrm{lev}\,}}_{\le \alpha }\varPhi \) is unbounded, proving that \(\varPhi \) is not level bounded. \(\square \)

1.2 A.2 Further results

This section contains a list of auxiliary results invoked in the main proofs of Sect. 2.

Lemma A.1

Suppose that Assumption I holds, and let two sequences \(({\varvec{u}}^k)_{{k\in \mathbb {N}}}\) and \(({\varvec{v}}^k)_{{k\in \mathbb {N}}}\) satisfy \({\varvec{v}}^k\in {\text {T}}_\varGamma ^{\textsc {fb}}({\varvec{u}}^k)\) for all \(k\) and be such that both converge to a point \({\varvec{u}}^\star \) as \(k\rightarrow \infty \). Then, \({\varvec{u}}^\star \in {\text {T}}_\varGamma ^{\textsc {fb}}({\varvec{u}}^\star )\), and in particular \(0\in \hat{\partial }\varPhi ({\varvec{u}}^\star )\).

Proof

Since \(\nabla F\) is continuous, it holds that \({{\varvec{u}}^k-\varGamma \nabla F({\varvec{u}}^k)}\rightarrow {{\varvec{u}}^\star -\varGamma \nabla F({\varvec{u}}^\star )}\) as \(k\rightarrow \infty \). From outer semicontinuity of \({{\,\mathrm{prox}\,}}_G^{\varGamma ^{-1}}\) [51, Ex. 5.23(b)] it then follows that

$$\begin{aligned} {\varvec{u}}^\star = \lim _{k\rightarrow \infty } {\varvec{v}}^k \in \limsup _{k\rightarrow \infty } {{\,\mathrm{prox}\,}}_G^{\varGamma ^{-1}}({{\varvec{u}}^k-\varGamma \nabla F({\varvec{u}}^k)}) {}\subseteq {} {{\,\mathrm{prox}\,}}_G^{\varGamma ^{-1}}({{\varvec{u}}^\star -\varGamma \nabla F({\varvec{u}}^\star )}) = {\text {T}}_\varGamma ^{\textsc {fb}}({\varvec{u}}^\star ), \end{aligned}$$

where the limit superior is meant in the Painlevé–Kuratowski sense, cf. [51, Def. 4.1]. The optimality conditions defining \({{\,\mathrm{prox}\,}}_G^{\varGamma ^{-1}}\) [51, Th. 10.1] then read

$$\begin{aligned} 0 \in&\hat{\partial }\left( G+\tfrac{1}{2}\Vert \cdot -({{\varvec{u}}^\star -\varGamma \nabla F({\varvec{u}}^\star )})\Vert _{\varGamma ^{-1}}^2 \right) ({\varvec{u}}^\star ) = \hat{\partial }G({\varvec{u}}^\star ) + \varGamma ^{-1}\left( {\varvec{u}}^\star - ({{\varvec{u}}^\star -\varGamma \nabla F({\varvec{u}}^\star )}) \right) \\&= \hat{\partial }G({\varvec{u}}^\star ) + \nabla F({\varvec{u}}^\star ) = \hat{\partial }\varPhi ({\varvec{u}}^\star ), \end{aligned}$$

where the first and last equalities follow from [51, Ex. 8.8(c)]. \(\square \)

Lemma A.2

Suppose that Assumption I holds and that function \(G\) is convex. Then, the following hold:

(i):

\({{\,\mathrm{prox}\,}}_G^{\varGamma ^{-1}}\) is (single-valued and) firmly nonexpansive (FNE) in the metric \(\Vert \cdot \Vert _{\varGamma ^{-1}}\); namely,

$$\begin{aligned} \Vert {{\,\mathrm{prox}\,}}_G^{\varGamma ^{-1}}({\varvec{u}}) - {{\,\mathrm{prox}\,}}_G^{\varGamma ^{-1}}({\varvec{v}}) \Vert _{\varGamma ^{-1}}^2 \le \langle {} {{\,\mathrm{prox}\,}}_G^{\varGamma ^{-1}}({\varvec{u}}) -{} {{\,\mathrm{prox}\,}}_G^{\varGamma ^{-1}}({\varvec{v}}) {},{} \varGamma ^{-1}({\varvec{u}}-{\varvec{v}}) {}\rangle \le \Vert {\varvec{u}} - {\varvec{v}} \Vert _{\varGamma ^{-1}}^2 \quad \forall {\varvec{u}},{\varvec{v}}; \end{aligned}$$
(ii):

the Moreau envelope \(G^{\varGamma ^{-1}}\) is differentiable with \(\nabla G^{\varGamma ^{-1}}=\varGamma ^{-1}(\mathrm{id}-{{\,\mathrm{prox}\,}}_G^{\varGamma ^{-1}})\);

(iii):

for every \({\varvec{x}}\in \mathbb {R}^{\sum _in_i}\) it holds that \( {{\,\mathrm{dist}\,}}(0,\partial \varPhi _\varGamma ^{\textsc {fb}}({\varvec{x}})) \le \tfrac{ N+\max _i\left\{ \gamma _iL_{f_i} \right\} }{ N\min _i\left\{ \sqrt{\gamma _i} \right\} } \Vert {\varvec{x}}-{\text {T}}_\varGamma ^{\textsc {fb}}({\varvec{x}})\Vert _{\varGamma ^{-1}} \);

(iv):

\({\text {T}}_\varGamma ^{\textsc {fb}}\) is \(L_\mathbf{T}\)-Lipschitz continuous in the metric \(\Vert \cdot \Vert _{\varGamma ^{-1}}\) for some \(L_\mathbf{T}\ge 0\);

If in addition \(f_i\) is \(\mu _{f_i}\)-strongly convex, \(i\in [N]\), then the following hold:

(v):

In A.2(iv), \(L_\mathbf{T}\le 1-\delta \) for \(\delta =\frac{1}{N}\min _{i\in [N]}\left\{ \gamma _i\mu _{f_i} \right\} \);

(vi):

For every \({\varvec{x}}\in \mathbb {R}^{\sum _in_i}\)

$$\begin{aligned} \tfrac{1}{2}\Vert {\varvec{z}}-{\varvec{x}}^\star \Vert _{\mu _F}^2 \le \varPhi _\varGamma ^{\textsc {fb}}({\varvec{x}})-\min \varPhi \le \tfrac{1}{2}\Vert {\varvec{z}}-{\varvec{x}}\Vert _{\varGamma ^{-2}\mu _F^{-1}(\mathrm{I}-\varGamma \mu _F)}^2 \end{aligned}$$

where \({\varvec{x}}^\star {:}{=}{{\,\mathrm{\mathrm{arg\,min}}\,}}\varPhi \), \( \mu _F {:}{=}\frac{1}{N}{{\,\mathrm{blkdiag}\,}}\bigl (\mu _{f_1}\mathrm{I}_{n_1},\dots ,\mu _{f_N}\mathrm{I}_{n_N}\bigr ) \), and \({\varvec{z}}={\text {T}}_\varGamma ^{\textsc {fb}}({\varvec{x}})\).

Proof

  • A.2(i) and A.2(ii) See [4, Prop.s 12.28 and 12.30].

  • A.2(iii) Let \(D\subseteq \mathbb {R}^{\sum _in_i}\) be the set of points at which \(\nabla F\) is differentiable. From the chain rule of differentiation applied to the expression (2.2d) and using assertion A.2(ii), we have that \(\varPhi _\varGamma ^{\textsc {fb}}\) is differentiable on \(D\) with gradient

    $$\begin{aligned} \nabla \varPhi _\varGamma ^{\textsc {fb}}({\varvec{x}}) = \bigl [ \mathrm{I}-\varGamma \nabla ^2F({\varvec{x}}) \bigr ] \varGamma ^{-1} \bigl [ {\varvec{x}}-{\text {T}}_\varGamma ^{\textsc {fb}}({\varvec{x}}) \bigr ] \quad \forall {\varvec{x}}\in D. \end{aligned}$$

    Since \(D\) is dense in \(\mathbb {R}^{\sum _in_i}\) owing to Lipschitz continuity of \(\nabla F\), we may invoke [51, Th. 9.61] to infer that \(\partial \varPhi _\varGamma ^{\textsc {fb}}({\varvec{x}})\) is nonempty for every \({\varvec{x}}\in \mathbb {R}^{\sum _in_i}\) and

    $$\begin{aligned} \partial \varPhi _\varGamma ^{\textsc {fb}}({\varvec{x}}) \supseteq \partial _B\varPhi _\varGamma ^{\textsc {fb}}({\varvec{x}})= & {} \bigl [ \mathrm{I}-\varGamma \partial _B\nabla F({\varvec{x}}) \bigr ] \varGamma ^{-1} \bigl [ {\varvec{x}}-{\text {T}}_\varGamma ^{\textsc {fb}}({\varvec{x}}) \bigr ] \\= & {} \bigl [ \varGamma ^{-1}-\partial _B\nabla F({\varvec{x}}) \bigr ] \bigl [ {\varvec{x}}-{\text {T}}_\varGamma ^{\textsc {fb}}({\varvec{x}}) \bigr ], \end{aligned}$$

    where \(\partial _B\) denotes the (set-valued) Bouligand differential [22, §7.1]. The claim now follows by observing that \( \partial _B\nabla F({\varvec{x}}) = \tfrac{1}{N}{{\,\mathrm{blkdiag}\,}}(\partial _B\nabla f_1(x_1),\dots ,\partial _B\nabla f_N(x_N)) \) and that each element of \(\partial _B\nabla f_i(x_i)\) has norm bounded by \(L_{f_i}\).

  • A.2(iv)  Lipschitz continuity follows from assertion A.2(i) together with the fact that Lipschitz continuity is preserved by composition.

  • A.2(v) By [41, Thm 2.1.12] for all \(x_i,y_i\in \mathbb {R}^{n_i}\)

    $$\begin{aligned}&\langle \nabla f_i(x_i)-\nabla f_i(y_i),x_i-y_i\rangle \ge \tfrac{\mu _{f_i}L_{f_i}}{\mu _{f_i}+L_{f_i}}\Vert x_i-y_i\Vert ^2\nonumber \\&\quad +\tfrac{1}{\mu _{f_i}+L_{f_i}}\Vert \nabla f_i(x_i)-\nabla f_i(y_i)\Vert ^2. \end{aligned}$$
    (A.3)

    For the forward operator we have

    $$\begin{aligned}&\Vert (\mathrm{id}-\tfrac{\gamma _i}{N}\nabla f_i)(x_i) - (\mathrm{id}-\tfrac{\gamma _i}{N}\nabla f_i)(y_i) \Vert ^2\\&= \Vert x_i-y_i\Vert ^2 + \tfrac{\gamma _i^2}{N^2} \Vert \nabla f_i(x_i)-\nabla f_i(y_i)\Vert ^2 - \tfrac{2\gamma _i}{N} \langle {}x_i-y_i{},{}\nabla f_i(x_i)-\nabla f_i(y_i){}\rangle \\&{\mathop {\le }\limits ^{(A.3)}} \Bigl ( 1-\tfrac{\gamma _i^2\mu _{f_i}L_{f_i}}{N^2} \Bigr ) \Vert x_i-y_i\Vert ^2 - \tfrac{\gamma _i}{N} \Bigl ( 2-\tfrac{\gamma _i}{N}(\mu _{f_i}+L_{f_i}) \Bigr ) \langle {}\nabla f_i(x_i)-\nabla f_i(y_i){},{}x_i-y_i{}\rangle \\&\le \left( 1-\tfrac{\gamma _i^2\mu _{f_i}L_{f_i}}{N^2}\right) \Vert x_i-y_i\Vert ^2 - \tfrac{\gamma _i\mu _{f_i}}{N} \left( 2-\tfrac{\gamma _i}{N}(\mu _{f_i}+L_{f_i})\right) \Vert x_i-y_i\Vert ^2\\&= \left( 1-\tfrac{\gamma _i\mu _{f_i}}{N}\right) ^2 \Vert x_i-y_i\Vert ^2, \end{aligned}$$

    where strong convexity and the fact that \(\gamma _i<\nicefrac {N}{L_{f_i}}\le \nicefrac {2N}{(\mu _{f_i}+L_{f_i})}\) were used in the second inequality. Multiplying by \(\gamma _i^{-1}\) and summing over i shows that \(\mathrm{id}-\varGamma \nabla F\) is \((1-\delta )\)-contractive in the metric \(\Vert \cdot \Vert _{\varGamma ^{-1}}\), and so is \({\text {T}}_\varGamma ^{\textsc {fb}}={{\,\mathrm{prox}\,}}_G^{\varGamma ^{-1}}\circ {}(\hbox {id} - \varGamma \nabla F)\) as it follows from assertion A.2(i).

  • A.2(vi)  By strong convexity, denoting \(\varPhi _\star {:}{=}\min \varPhi \), we have

    $$\begin{aligned} \varPhi _\star \le \varPhi ({\varvec{z}})-\tfrac{1}{2}\Vert {\varvec{z}}-{\varvec{x}}^\star \Vert _{\mu _F}^2 \le \varPhi _\varGamma ^{\textsc {fb}}({\varvec{x}}) - \tfrac{1}{2}\Vert {\varvec{z}}-{\varvec{x}}^\star \Vert _{\mu _F}^2 \end{aligned}$$

    where the second inequality follows from Lemma 2.3(ii). This establishes the lower bound. Since \({\varvec{z}}\) is a minimizer in (2.2a), the necessary stationarity condition reads \( \varGamma ^{-1}({\varvec{x}}-{\varvec{z}})-\nabla F({\varvec{x}}) \in \partial G({\varvec{z}})\). Convexity of \(G\) then implies

    $$\begin{aligned} G({\varvec{x}}^\star ) \ge G({\varvec{z}}) + \langle {}\varGamma ^{-1}({\varvec{x}}-{\varvec{z}})-\nabla F({\varvec{x}}){},{}{\varvec{x}}^\star -{\varvec{z}}{}\rangle , \end{aligned}$$

    whereas from strong convexity of \(F\) we have

    $$\begin{aligned} F({\varvec{x}}^\star ) \ge F({\varvec{x}}) + \langle {}\nabla F({\varvec{x}}){},{}{\varvec{x}}^\star -{\varvec{x}}{}\rangle + \tfrac{1}{2}\Vert {\varvec{x}}-{\varvec{x}}^\star \Vert ^2_{\mu _F}. \end{aligned}$$

    By combining these inequalities and (2.2b), we have

    $$\begin{aligned} \varPhi _\varGamma ^{\textsc {fb}}({\varvec{x}})-\varPhi _\star&\le \tfrac{1}{2}\Vert {\varvec{z}}-{\varvec{x}}\Vert ^2_{\varGamma ^{-1}} - \tfrac{1}{2}\Vert {\varvec{x}}^\star -{\varvec{x}}\Vert ^2_{\mu _F} + \langle {}\varGamma ^{-1}({\varvec{z}}-{\varvec{x}}){},{}{\varvec{x}}^\star -{\varvec{z}}{}\rangle \\&= \tfrac{1}{2}\Vert {\varvec{z}}-{\varvec{x}}\Vert _{\varGamma ^{-1}-\mu _F}^2 + \langle {}(\varGamma ^{-1}-\mu _F)({\varvec{z}}-{\varvec{x}}){},{}{\varvec{x}}^\star -{\varvec{z}}{}\rangle - \tfrac{1}{2}\Vert {\varvec{x}}^\star -{\varvec{z}}\Vert _{\mu _F}^2. \end{aligned}$$

    Next, by using the inequality \( \langle {}{\varvec{a}}{},{}{\varvec{b}}{}\rangle \le \tfrac{1}{2}\Vert {\varvec{a}}\Vert _{\mu _F}^2 + \tfrac{1}{2}\Vert {\varvec{b}}\Vert ^2_{\mu _F^{-1}} \) to cancel out the last term, we obtain

    $$\begin{aligned} \varPhi _\varGamma ^{\textsc {fb}}({\varvec{x}})-\varPhi _\star&\le \tfrac{1}{2}\Vert {\varvec{z}}-{\varvec{x}}\Vert _{\varGamma ^{-1}-\mu _F}^2 + \tfrac{1}{2}\Vert (\varGamma ^{-1}-\mu _F)({\varvec{x}}-{\varvec{z}})\Vert _{\mu _F^{-1}}^2\\&= \tfrac{1}{2}\Vert {\varvec{z}}-{\varvec{x}}\Vert _{\varGamma ^{-2}\mu _F^{-1}(\mathrm{I}-\varGamma \mu _F)}^2, \end{aligned}$$

    where the last identity uses the fact that the matrices are diagonal. \(\square \)

The next result recaps an important property that the FBE inherits from the cost function \(\varPhi \) that is instrumental for establishing global convergence and asymptotic linear rates for the BC Algorithm 1. The result falls as special case of [64, Th. 5.2] after observing that

$$\begin{aligned} \varPhi _\varGamma ^{\textsc {fb}}({\varvec{x}}) = \inf _{{\varvec{w}}}\left\{ \varPhi ({\varvec{w}}) + D_H({\varvec{w}},{\varvec{x}}) \right\} , \end{aligned}$$

where \( D_H({\varvec{w}},{\varvec{x}}) = H({\varvec{w}})-H({\varvec{x}})-\langle {}\nabla H({\varvec{x}}){},{}{\varvec{w}}-{\varvec{x}}{}\rangle \) is the Bregman distance with kernel \(H=\tfrac{1}{2}\Vert \cdot \Vert _{\varGamma ^{-1}}^2-F\).

Lemma A.3

([64, Th. 5.2]) Suppose that Assumption I holds and for \(\gamma _i\in (0,\nicefrac {N}{L_{f_i}})\), \(i\in [N]\), let \(\varGamma ={{\,\mathrm{blkdiag}\,}}(\gamma _1\mathrm{I}_{n_1},\dots ,\gamma _N\mathrm{I}_{n_N})\). If \(\varPhi \) has the KL property with exponent \(\theta \in (0,1)\) (as is the case when \(f_i\) and \(G\) are semialgebraic), then so does \(\varPhi _\varGamma ^{\textsc {fb}}\) with exponent \( \max \left\{ \nicefrac 12,\theta \right\} \).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Latafat, P., Themelis, A. & Patrinos, P. Block-coordinate and incremental aggregated proximal gradient methods for nonsmooth nonconvex problems. Math. Program. 193, 195–224 (2022). https://doi.org/10.1007/s10107-020-01599-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10107-020-01599-7

Keywords

Mathematics Subject Classification

Navigation