Proximal-like incremental aggregated gradient method with Bregman distance in weakly convex optimization problems

Jia, Zehui; Huang, Jieru; Cai, Xingju

doi:10.1007/s10898-021-01044-9

Proximal-like incremental aggregated gradient method with Bregman distance in weakly convex optimization problems

Published: 29 May 2021

Volume 80, pages 841–864, (2021)
Cite this article

Journal of Global Optimization Aims and scope Submit manuscript

541 Accesses
3 Citations
Explore all metrics

Abstract

We focus on a special nonconvex and nonsmooth composite function, which is the sum of the smooth weakly convex component functions and a proper lower semi-continuous weakly convex function. An algorithm called the proximal-like incremental aggregated gradient (PLIAG) method proposed in Zhang et al. (Math Oper Res 46(1): 61–81, 2021) is proved to be convergent and highly efficient to solve convex minimization problems. This algorithm can not only avoid evaluating the exact full gradient which can be expensive in big data models but also weaken the stringent global Lipschitz gradient continuity assumption on the smooth part of the problem. However, under the nonconvex case, there is few analysis on the convergence of the PLIAG method. In this paper, we prove that the limit point of the sequence generated by the PLIAG method is the critical point of the weakly convex problems. Under further assumption that the objective function satisfies the Kurdyka–Łojasiewicz (KL) property, we prove that the generated sequence converges globally to a critical point of the problem. Additionally, we give the convergence rate when the Łojasiewicz exponent is known.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Nonconvex Proximal Incremental Aggregated Gradient Method with Linear Convergence

Article 17 May 2019

Inertial proximal gradient methods with Bregman regularization for a class of nonconvex optimization problems

Article 19 August 2020

Inertial proximal incremental aggregated gradient method with linear convergence guarantees

Article 25 June 2022

References

Attouch, H., Bolte, J.: On the convergence of the proximal algorithm for nonsmooth functions involving analytic features. Math. Program. 116(1), 5–16 (2009)
Article MathSciNet Google Scholar
Attouch, H., Bolte, J., Redont, P., Soubeyran, A.: Proximal alternating minimization and projection methods for nonconvex problems: An approach based on the kurdyka-łojasiewicz inequality. Math. Oper. Res. 35(2), 438–457 (2010)
Article MathSciNet Google Scholar
Aytekin, A., Feyzmahdavian, H.R., Johansson, M.: Analysis and implementation of an asynchronous optimization algorithm for the parameter server. arXiv preprint arXiv:1610.05507 (2016)
Bauschke, H.H., Bolte, J., Teboulle, M.: A descent lemma beyond lipschitz gradient continuity: first-order methods revisited and applications. Math. Oper. Res. 42(2), 330–348 (2017)
Article MathSciNet Google Scholar
Beck, A., Teboulle, M.: Fast gradient-based algorithms for constrained total variation image denoising and deblurring problems. IEEE Trans. Image Process. 18(11), 2419–2434 (2009)
Article MathSciNet Google Scholar
Beck, A., Teboulle, M.: Gradient-based algorithms with applications to signal recovery problems. In: Palomar, D., Eldar, Y.C. (eds.) Convex Optimization in Signal Processing and Communications, pp. 139–162. Cambridge University Press, Cambridge (2009)
Google Scholar
Bolte, J., Sabach, S., Teboulle, M.: Proximal alternating linearized minimization for nonconvex and nonsmooth problems. Math. Program. 146(1), 459–494 (2014)
Article MathSciNet Google Scholar
Bolte, J., Sabach, S., Teboulle, M., Vaisbourd, Y.: First order methods beyond convexity and lipschitz gradient continuity with applications to quadratic inverse problems. SIAM J. Optim. 28(3), 2131–2151 (2018)
Article MathSciNet Google Scholar
Boţ, R.I., Csetnek, E.R., László, S.C.: An inertial forward-backward algorithm for the minimization of the sum of two nonconvex functions. EURO J. Comput. Optim. 4(1), 3–25 (2016)
Article MathSciNet Google Scholar
Bregman, L.M.: The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming. USSR Comput. Math. Math. Phys. 7(3), 200–217 (1967)
Article MathSciNet Google Scholar
Chen, G., Teboulle, M.: Convergence analysis of a proximal-like minimization algorithm using bregman functions. SIAM J. Optim. 3(3), 538–543 (1993)
Article MathSciNet Google Scholar
Jia, Z.H., Wu, Z.M., Dong, X.M.: An inexact proximal gradient algorithm with extrapolation for a class of nonconvex nonsmooth optimization problems. J. Inequal. Appl. 2019(1), 125 (2019)
Article MathSciNet Google Scholar
Kurdyka, K.: On gradients of functions definable in o-minimal structures. Annal. de l’Institut Fourier 48(3), 769–783 (1998)
Article MathSciNet Google Scholar
Li, H., Lin, Z.C.: Accelerated proximal gradient methods for nonconvex programming. In: Advances in neural information processing systems, pp. 379-387 (2015)
Lions, P.L., Mercier, B.: Splitting algorithms for the sum of two nonlinear operators. SIAM J. Numer. Anal. 16(6), 964–979 (1979)
Article MathSciNet Google Scholar
Lojasiewicz, S.: Une propriété topologique des sous-ensembles analytiques réels. Les Équations aux Dérivées Partielles 117, 87–89 (1963)
MATH Google Scholar
Mordukhovich, B.S.: Variational analysis and generalized differentiation I: Basic theory. Grundlehren der Mathematischen Wissenschaften, vol. 330. Springer, Berlin (2006)
Peng, W., Zhang, H., Zhang, X.Y.: Nonconvex proximal incremental aggregated gradient method with linear convergence. J. Optim. Theory Appl. 183(1), 230–245 (2019)
Article MathSciNet Google Scholar
Rockafellar, R.T., Wets, R.J.-B.: Variational analysis. Fundamental Principles of Mathematical Science, vol. 317. Springer, Berlin (1998)
Google Scholar
Vanli, N.D., Gurbuzbalaban, M., Ozdaglar, A.: Global convergence rate of proximal incremental aggregated gradient methods. SIAM J. Optim. 28(2), 1282–1300 (2018)
Article MathSciNet Google Scholar
Vanli, N.D., Gurbuzbalaban, M., Ozdaglar, A.: A stronger convergence result on the proximal incremental aggregated gradient method. arXiv preprint arXiv:1611.08022, (2016)
Zhang, H., Dai, Y.H., Guo, L., Peng, W.: Proximal-like incremental aggregated gradient method with linear convergence under bregman distance growth conditions. Math. Oper. Res. 46(1), 61–81 (2021)
Article MathSciNet Google Scholar
Zhang, X.Y., Zhang, H., Peng, W.: Inertial bregman proximal gradient algorithm for nonconvex problem with smooth adaptable property. arXiv preprint arXiv:1904.04436, (2019)

Download references

Acknowledgements

This work is supported by National Natural Science Foundation of China (Grant No. 11801279 and 11871279), Natural Science Foundation of Jiangsu Province (Grant No. BK20180782), and the Startup Foundation for Introducing Talent of NUIST (Grant No. 2017r059).

Author information

Authors and Affiliations

Department of Information and Computing Science, School of Mathematics and Statistics, Nanjing University of Information Science and Technology, Nanjing, 210044, People’s Republic of China
Zehui Jia
School of Mathematics and Statistics, Nanjing University of Information Science and Technology, Nanjing, 210044, People’s Republic of China
Jieru Huang
School of Mathematical Sciences, Jiangsu Key Lab for NSLSCS, Nanjing Normal University, Nanjing, 210023, People’s Republic of China
Xingju Cai

Authors

Zehui Jia
View author publications
You can also search for this author in PubMed Google Scholar
Jieru Huang
View author publications
You can also search for this author in PubMed Google Scholar
Xingju Cai
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xingju Cai.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Proof of Lemma 3.1

Considering the $L_j$-smooth of the pair $( f_j,\omega )$ according to Assumption 2(a), we can obtain that

$$\begin{aligned} f_{j}(x_{k+1})&\le f_{j}(x_{k-\tau _{k}^{j}})+\left\langle \nabla f_{j}(x_{k-\tau _{k}^{j}}),x_{k+1}-x_{k-\tau _{k}^{j}} \right\rangle +L_j\cdot D_{\omega }(x_{k+1},x_{k-\tau _k^j})\nonumber \\&\le f_j(x)+\frac{\beta _j}{2}\left\| x-x_{k-\tau _k^j}\right\| ^2+ \left\langle \nabla f_{j}(x_{k-\tau _{k}^{j}}),x_{k+1}-x \right\rangle +L_j\cdot D_{\omega }(x_{k+1},x_{k-\tau _k^j}), \end{aligned}$$

(31)

where the second inequality follows from the convexity of $f_j(\cdot )+\frac{\beta _j}{2}\Vert \cdot \Vert ^2$ according to Assumption 2(b). Summing (31) over all $j\in {\mathcal {J}}_k$, we can get

$$\begin{aligned} \sum \limits _{j\in {\mathcal {J}}_{k}}f_{j}(x_{k+1})\le&\sum \limits _{j\in {\mathcal {J}}_{k}}f_{j}(x)+\frac{\gamma _1}{2}\sum \limits _{j\in {\mathcal {J}}_{k}}\left\| x-x_{k-\tau _{k}^{j}}\right\| ^{2}+ \left\langle \sum \limits _{j\in {\mathcal {J}}_{k}}\nabla f_{j}(x_{k-\tau _{k}^{j}}),x_{k+1}-x\right\rangle \nonumber \\&+\sum \limits _{j\in {\mathcal {J}}_{k}}L_{j} \cdot D_{\omega }(x_{k+1},x_{k-\tau _{k}^{j}}). \end{aligned}$$

(32)

where $\gamma _1$ is defined in Assumption 2 as $\gamma _1{:}{=}\sum _{n=1}^N\beta _n$. By the optimality of $x_{k+1}$, we have

$$\begin{aligned} -\sum \limits _{j\in {\mathcal {J}}_{k}}\nabla f_{j}(x_{k-\tau _k^j})-\frac{1}{\alpha } \left( \nabla \omega (x_{k+1})-\nabla \omega (x_k)\right) \in \partial h(x_{k+1})+\sum \limits _{i\in {\mathcal {I}}_{k}}\nabla f_i (x_{k+1}). \end{aligned}$$

(33)

Using the subgradient inequality for the convex function $h(x)+\sum \limits _{i\in {\mathcal {I}}_{k}}f_i (x)+\frac{\gamma _1+\gamma _2}{2}\Vert x\Vert ^2$ at $x_{k+1}$, we have

$$\begin{aligned}&h(x_{k+1})+\sum \limits _{i\in {\mathcal {I}}_{k}}f_i (x_{k+1}) \nonumber \\ \le&h(x)+\sum \limits _{i\in {\mathcal {I}}_{k}}f_{i}(x)+ \left\langle \sum \limits _{j\in {\mathcal {J}}_{k}}\nabla f_{j}(x_{k-\tau _{k}^{j}})+\frac{1}{\alpha }\left( \nabla \omega (x_{k+1}) -\nabla \omega (x_k)\right) ,x-x_{k+1}\right\rangle \nonumber \\&+ \frac{\gamma _{1}+\gamma _{2}}{2}\Vert x-x_{k+1}\Vert ^2 \nonumber \\ =&h(x)+\sum \limits _{i\in {\mathcal {I}}_{k}}f_{i}(x)+ \left\langle \sum \limits _{j\in {\mathcal {J}}_{k}}\nabla f_{j}(x_{k-\tau _{k}^{j}}),x-x_{k+1}\right\rangle + \frac{\gamma _{1}+\gamma _{2}}{2}\Vert x-x_{k+1}\Vert ^2 \nonumber \\&+\frac{1}{\alpha }\langle \nabla \omega (x_{k+1})- \nabla \omega (x_k),x-x_{k+1}\rangle \nonumber \\ =&h(x)+\sum \limits _{i\in {\mathcal {I}}_{k}}f_{i}(x)+ \left\langle \sum \limits _{j\in {\mathcal {J}}_{k}}\nabla f_{j}(x_{k-\tau _{k}^{j}}),x-x_{k+1}\right\rangle + \frac{\gamma _{1}+\gamma _{2}}{2}\Vert x-x_{k+1}\Vert ^{2} \nonumber \\&+\frac{1}{\alpha }D_{\omega }(x,x_{k})- \frac{1}{\alpha }D_{\omega }(x,x_{k+1})- \frac{1}{\alpha }D_{\omega }(x_{k+1},x_{k}), \end{aligned}$$

(34)

where the last equality follows from the three-point identity of the Bregman distance. Adding (34) to (32), we get

$$\begin{aligned} \Phi (x_{k+1})&\le \Phi (x)+\frac{\gamma _{1}+\gamma _{2}}{2}\Vert x-x_{k+1}\Vert ^{2} +\frac{\gamma _{1}}{2}\sum \limits _{j\in {\mathcal {J}}_{k}}\Vert x-x_{k-\tau _{k}^{j}}\Vert ^{2}\nonumber \\&\quad +\frac{1}{\alpha }D_{\omega }(x,x_{k})- \frac{1}{\alpha }D_{\omega }(x,x_{k+1})- \frac{1}{\alpha }D_{\omega }(x_{k+1},x_{k})+ \sum \limits _{j\in {\mathcal {J}}_{k}}L_{j}\cdot D_{\omega }(x_{k+1},x_{k-\tau _{k}^{j}}). \end{aligned}$$

(35)

According to Assumption 1(d), we deduce

$$\begin{aligned} \sum _{i\in {\mathcal {J}}_{k}}L_{i}\cdot D_{\omega }(x_{k+1},x_{k-\tau _{k}^{i}})&\le \sum _{i\in {\mathcal {J}}_{k}}L_{i}\cdot \ell (\tau _k^i+1)\cdot \sum _{j=k-\tau _k^i}^k D_{\omega }(x_{j+1},x_j)\\&\le \sum _{i\in {\mathcal {J}}_{k}}L_{i}\cdot \ell (\tau +1)\cdot \sum _{j=k -\tau }^k D_{\omega }(x_{j+1},x_j)\\&=\ell (\tau +1)\cdot \sum _{i\in {\mathcal {J}}_k}L_i\sum _{j=k-\tau }^{k}D_{\omega }(x_{j+1},x_{j})\\&\le L\cdot \ell (\tau +1)\cdot \sum _{j=k-\tau }^{k}D_{\omega }(x_{j+1},x_{j}), \end{aligned}$$

where the second inequality is due to the fact that $\tau _k^i$ is bounded above by $\tau $ and the last inequality follows from $L{:}{=}\sum _{j=1}^N L_j$. Together with the above inequality, (35) can be rewritten as

$$\begin{aligned} \Phi (x_{k+1})\le&\Phi (x)+\frac{\gamma _{1}+\gamma _{2}}{2}\Vert x-x_{k+1}\Vert ^{2} +\frac{\gamma _{1}}{2}\sum \limits _{j\in {\mathcal {J}}_{k}}\Vert x-x_{k-\tau _{k}^{j}}\Vert ^{2}\nonumber \\&+\frac{1}{\alpha }D_{\omega }(x,x_{k})- \frac{1}{\alpha }D_{\omega }(x,x_{k+1})- \frac{1}{\alpha }D_{\omega }(x_{k+1},x_{k})+ L\cdot \ell (\tau +1)\cdot \sum \limits _{j=k-\tau }^{k}D_{\omega }(x_{j+1},x_{j}). \end{aligned}$$

(36)

Invoking (36) with $x=x_k$ implies

$$\begin{aligned} \begin{aligned} \Phi (x_{k+1}) \le&\Phi (x_k)+\frac{\gamma _{1}+\gamma _{2}}{2}\Vert x_k-x_{k+1}\Vert ^{2} +\frac{\gamma _{1}}{2}\sum \limits _{j\in {\mathcal {J}}_{k}}\Vert x_k-x_{k-\tau _{k}^{j}}\Vert ^{2}\\&-\frac{1}{\alpha }D_{\omega }(x_k,x_{k+1})- \frac{1}{\alpha }D_{\omega }(x_{k+1},x_{k})+ L\cdot \ell (\tau +1)\cdot \sum \limits _{j=k-\tau }^{k}D_{\omega }(x_{j+1},x_{j}). \end{aligned} \end{aligned}$$

(37)

The component $\sum _{j=k-\tau }^{k}D_{\omega }(x_{j+1},x_{j})$ in (37) can be written as

$$\begin{aligned} \sum \limits _{j=k-\tau }^{k}D_{\omega }(x_{j+1},x_{j})=&(\tau +1)D_\omega (x_{k+1},x_k)+ \sum _{i=1}^{\tau }i D_{\omega }(x_{k-\tau +i},x_{k-\tau +i-1}) -\sum \limits _{i=1}^{\tau }i D_{\omega }(x_{k+1-\tau +i},x_{k-\tau +i}), \end{aligned}$$

(38)

and the component $\Vert x_k-x_{k-\tau _{k}^{j}}\Vert ^{2}$ in (37) has the following inequality

$$\begin{aligned}&\left\| x_{k}-x_{k-\tau _{k}^{j}}\right\| ^{2}\le \tau _k^j\sum \limits _{j=k-\tau _k^j}^{k-1}\Vert x_{j+1}-x_{j}\Vert ^{2}\le \tau \sum \limits _{j=k-\tau }^{k}\Vert x_{j+1}-x_{j}\Vert ^{2}\nonumber \\ =&\tau (\tau +1)\Vert x_{k+1}-x_{k}\Vert ^{2} +\tau \sum \limits _{i=1}^{\tau }i\Vert x_{k-\tau +i}-x_{k-\tau +i-1}\Vert ^{2} -\tau \sum \limits _{i=1}^{\tau }i\Vert x_{k+1-\tau +i}-x_{k-\tau +i}\Vert ^{2}. \end{aligned}$$

Using the above inequaity, we have

$$\begin{aligned}&\sum _{j\in {\mathcal {J}}_{k}}\left\| x_{k}-x_{k-\tau _{k}^{j}}\right\| ^{2}\nonumber \\ \le&J_k\left[ \tau (\tau +1)\Vert x_{k+1}-x_{k}\Vert ^{2} +\tau \sum \limits _{i=1}^{\tau }i\Vert x_{k-\tau +i}-x_{k-\tau +i-1}\Vert ^{2} -\tau \sum \limits _{i=1}^{\tau }i\Vert x_{k+1-\tau +i}-x_{k-\tau +i}\Vert ^{2}\right] \nonumber \\ \le&N\left[ \tau (\tau +1)\Vert x_{k+1}-x_{k}\Vert ^{2} +\tau \sum \limits _{i=1}^{\tau }i\Vert x_{k-\tau +i}-x_{k-\tau +i-1}\Vert ^{2} -\tau \sum \limits _{i=1}^{\tau }i\Vert x_{k+1-\tau +i}-x_{k-\tau +i}\Vert ^{2}\right] . \end{aligned}$$

(39)

Plugging (38) and (39) into (37) and minusing $\upsilon ({\mathcal {P}})$ on both sides of the inequality, we can get

$$\begin{aligned}&\Phi (x_{k+1})-\upsilon ({\mathcal {P}})+L\cdot \ell (\tau +1)\cdot \sum \limits _{i=1}^{\tau }i D_{\omega }(x_{k+1-\tau +i},x_{k-\tau +i}) +\frac{\gamma _{1}\tau N}{2}\sum \limits _{i=1}^{\tau }i\Vert x_{k+1-\tau +i}-x_{k-\tau +i}\Vert ^{2}\nonumber \\ \le&\Phi (x_{k})-\upsilon ({\mathcal {P}}) +L\cdot \ell (\tau +1)\cdot \sum \limits _{i=1}^{\tau }i D_{\omega }(x_{k-\tau +i},x_{k-\tau +i-1}) +\frac{\gamma _{1}\tau N}{2}\sum \limits _{i=1}^{\tau }i\Vert x_{k-\tau +i}-x_{k-\tau +i-1}\Vert ^{2}\nonumber \\&-\frac{1}{\alpha }D_{\omega }(x_{k},x_{k+1}) -\left[ \frac{1}{\alpha }-L\cdot \ell (\tau +1)\cdot (\tau +1)\right] D_{\omega }(x_{k+1},x_{k})\nonumber \\&+\frac{\gamma _1+\gamma _2}{2}\Vert x_{k}-x_{k+1}\Vert ^{2}+{\frac{\gamma _{1}\tau N}{2}(\tau +1)}\Vert x_{k+1}-x_{k}\Vert ^{2}. \end{aligned}$$

(40)

Recalling the definition of $T_k$, $C_1$ and $C_2$, (40) can be rewritten as

$$\begin{aligned} T_{k+1}\le T_{k}-\frac{1}{\alpha }D_{\omega }(x_{k},x_{k+1}) -C_1D_{\omega }(x_{k+1},x_{k})+C_2\Vert x_{k+1}-x_{k}\Vert ^{2}, \end{aligned}$$

thus complete the proof. $\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Jia, Z., Huang, J. & Cai, X. Proximal-like incremental aggregated gradient method with Bregman distance in weakly convex optimization problems. J Glob Optim 80, 841–864 (2021). https://doi.org/10.1007/s10898-021-01044-9

Download citation

Received: 01 August 2020
Accepted: 18 May 2021
Published: 29 May 2021
Issue Date: August 2021
DOI: https://doi.org/10.1007/s10898-021-01044-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Proximal-like incremental aggregated gradient method with Bregman distance in weakly convex optimization problems

Abstract

Access this article

Similar content being viewed by others

Nonconvex Proximal Incremental Aggregated Gradient Method with Linear Convergence

Inertial proximal gradient methods with Bregman regularization for a class of nonconvex optimization problems

Inertial proximal incremental aggregated gradient method with linear convergence guarantees

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix

Proof of Lemma 3.1

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Proximal-like incremental aggregated gradient method with Bregman distance in weakly convex optimization problems

Abstract

Access this article

Similar content being viewed by others

Nonconvex Proximal Incremental Aggregated Gradient Method with Linear Convergence

Inertial proximal gradient methods with Bregman regularization for a class of nonconvex optimization problems

Inertial proximal incremental aggregated gradient method with linear convergence guarantees

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix

Appendix

Proof of Lemma 3.1

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation