Skip to main content
Log in

Proximal-like incremental aggregated gradient method with Bregman distance in weakly convex optimization problems

  • Published:
Journal of Global Optimization Aims and scope Submit manuscript

Abstract

We focus on a special nonconvex and nonsmooth composite function, which is the sum of the smooth weakly convex component functions and a proper lower semi-continuous weakly convex function. An algorithm called the proximal-like incremental aggregated gradient (PLIAG) method proposed in Zhang et al. (Math Oper Res 46(1): 61–81, 2021) is proved to be convergent and highly efficient to solve convex minimization problems. This algorithm can not only avoid evaluating the exact full gradient which can be expensive in big data models but also weaken the stringent global Lipschitz gradient continuity assumption on the smooth part of the problem. However, under the nonconvex case, there is few analysis on the convergence of the PLIAG method. In this paper, we prove that the limit point of the sequence generated by the PLIAG method is the critical point of the weakly convex problems. Under further assumption that the objective function satisfies the Kurdyka–Łojasiewicz (KL) property, we prove that the generated sequence converges globally to a critical point of the problem. Additionally, we give the convergence rate when the Łojasiewicz exponent is known.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Attouch, H., Bolte, J.: On the convergence of the proximal algorithm for nonsmooth functions involving analytic features. Math. Program. 116(1), 5–16 (2009)

    Article  MathSciNet  Google Scholar 

  2. Attouch, H., Bolte, J., Redont, P., Soubeyran, A.: Proximal alternating minimization and projection methods for nonconvex problems: An approach based on the kurdyka-łojasiewicz inequality. Math. Oper. Res. 35(2), 438–457 (2010)

    Article  MathSciNet  Google Scholar 

  3. Aytekin, A., Feyzmahdavian, H.R., Johansson, M.: Analysis and implementation of an asynchronous optimization algorithm for the parameter server. arXiv preprint arXiv:1610.05507 (2016)

  4. Bauschke, H.H., Bolte, J., Teboulle, M.: A descent lemma beyond lipschitz gradient continuity: first-order methods revisited and applications. Math. Oper. Res. 42(2), 330–348 (2017)

    Article  MathSciNet  Google Scholar 

  5. Beck, A., Teboulle, M.: Fast gradient-based algorithms for constrained total variation image denoising and deblurring problems. IEEE Trans. Image Process. 18(11), 2419–2434 (2009)

    Article  MathSciNet  Google Scholar 

  6. Beck, A., Teboulle, M.: Gradient-based algorithms with applications to signal recovery problems. In: Palomar, D., Eldar, Y.C. (eds.) Convex Optimization in Signal Processing and Communications, pp. 139–162. Cambridge University Press, Cambridge (2009)

    Google Scholar 

  7. Bolte, J., Sabach, S., Teboulle, M.: Proximal alternating linearized minimization for nonconvex and nonsmooth problems. Math. Program. 146(1), 459–494 (2014)

    Article  MathSciNet  Google Scholar 

  8. Bolte, J., Sabach, S., Teboulle, M., Vaisbourd, Y.: First order methods beyond convexity and lipschitz gradient continuity with applications to quadratic inverse problems. SIAM J. Optim. 28(3), 2131–2151 (2018)

    Article  MathSciNet  Google Scholar 

  9. Boţ, R.I., Csetnek, E.R., László, S.C.: An inertial forward-backward algorithm for the minimization of the sum of two nonconvex functions. EURO J. Comput. Optim. 4(1), 3–25 (2016)

    Article  MathSciNet  Google Scholar 

  10. Bregman, L.M.: The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming. USSR Comput. Math. Math. Phys. 7(3), 200–217 (1967)

    Article  MathSciNet  Google Scholar 

  11. Chen, G., Teboulle, M.: Convergence analysis of a proximal-like minimization algorithm using bregman functions. SIAM J. Optim. 3(3), 538–543 (1993)

    Article  MathSciNet  Google Scholar 

  12. Jia, Z.H., Wu, Z.M., Dong, X.M.: An inexact proximal gradient algorithm with extrapolation for a class of nonconvex nonsmooth optimization problems. J. Inequal. Appl. 2019(1), 125 (2019)

    Article  MathSciNet  Google Scholar 

  13. Kurdyka, K.: On gradients of functions definable in o-minimal structures. Annal. de l’Institut Fourier 48(3), 769–783 (1998)

    Article  MathSciNet  Google Scholar 

  14. Li, H., Lin, Z.C.: Accelerated proximal gradient methods for nonconvex programming. In: Advances in neural information processing systems, pp. 379-387 (2015)

  15. Lions, P.L., Mercier, B.: Splitting algorithms for the sum of two nonlinear operators. SIAM J. Numer. Anal. 16(6), 964–979 (1979)

    Article  MathSciNet  Google Scholar 

  16. Lojasiewicz, S.: Une propriété topologique des sous-ensembles analytiques réels. Les Équations aux Dérivées Partielles 117, 87–89 (1963)

    MATH  Google Scholar 

  17. Mordukhovich, B.S.: Variational analysis and generalized differentiation I: Basic theory. Grundlehren der Mathematischen Wissenschaften, vol. 330. Springer, Berlin (2006)

  18. Peng, W., Zhang, H., Zhang, X.Y.: Nonconvex proximal incremental aggregated gradient method with linear convergence. J. Optim. Theory Appl. 183(1), 230–245 (2019)

    Article  MathSciNet  Google Scholar 

  19. Rockafellar, R.T., Wets, R.J.-B.: Variational analysis. Fundamental Principles of Mathematical Science, vol. 317. Springer, Berlin (1998)

    Google Scholar 

  20. Vanli, N.D., Gurbuzbalaban, M., Ozdaglar, A.: Global convergence rate of proximal incremental aggregated gradient methods. SIAM J. Optim. 28(2), 1282–1300 (2018)

    Article  MathSciNet  Google Scholar 

  21. Vanli, N.D., Gurbuzbalaban, M., Ozdaglar, A.: A stronger convergence result on the proximal incremental aggregated gradient method. arXiv preprint arXiv:1611.08022, (2016)

  22. Zhang, H., Dai, Y.H., Guo, L., Peng, W.: Proximal-like incremental aggregated gradient method with linear convergence under bregman distance growth conditions. Math. Oper. Res. 46(1), 61–81 (2021)

    Article  MathSciNet  Google Scholar 

  23. Zhang, X.Y., Zhang, H., Peng, W.: Inertial bregman proximal gradient algorithm for nonconvex problem with smooth adaptable property. arXiv preprint arXiv:1904.04436, (2019)

Download references

Acknowledgements

This work is supported by National Natural Science Foundation of China (Grant No. 11801279 and 11871279), Natural Science Foundation of Jiangsu Province (Grant No. BK20180782), and the Startup Foundation for Introducing Talent of NUIST (Grant No. 2017r059).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xingju Cai.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

Proof of Lemma 3.1

Considering the \(L_j\)-smooth of the pair \(( f_j,\omega )\) according to Assumption 2(a), we can obtain that

$$\begin{aligned} f_{j}(x_{k+1})&\le f_{j}(x_{k-\tau _{k}^{j}})+\left\langle \nabla f_{j}(x_{k-\tau _{k}^{j}}),x_{k+1}-x_{k-\tau _{k}^{j}} \right\rangle +L_j\cdot D_{\omega }(x_{k+1},x_{k-\tau _k^j})\nonumber \\&\le f_j(x)+\frac{\beta _j}{2}\left\| x-x_{k-\tau _k^j}\right\| ^2+ \left\langle \nabla f_{j}(x_{k-\tau _{k}^{j}}),x_{k+1}-x \right\rangle +L_j\cdot D_{\omega }(x_{k+1},x_{k-\tau _k^j}), \end{aligned}$$
(31)

where the second inequality follows from the convexity of \(f_j(\cdot )+\frac{\beta _j}{2}\Vert \cdot \Vert ^2\) according to Assumption 2(b). Summing (31) over all \(j\in {\mathcal {J}}_k\), we can get

$$\begin{aligned} \sum \limits _{j\in {\mathcal {J}}_{k}}f_{j}(x_{k+1})\le&\sum \limits _{j\in {\mathcal {J}}_{k}}f_{j}(x)+\frac{\gamma _1}{2}\sum \limits _{j\in {\mathcal {J}}_{k}}\left\| x-x_{k-\tau _{k}^{j}}\right\| ^{2}+ \left\langle \sum \limits _{j\in {\mathcal {J}}_{k}}\nabla f_{j}(x_{k-\tau _{k}^{j}}),x_{k+1}-x\right\rangle \nonumber \\&+\sum \limits _{j\in {\mathcal {J}}_{k}}L_{j} \cdot D_{\omega }(x_{k+1},x_{k-\tau _{k}^{j}}). \end{aligned}$$
(32)

where \(\gamma _1\) is defined in Assumption 2 as \(\gamma _1{:}{=}\sum _{n=1}^N\beta _n\). By the optimality of \(x_{k+1}\), we have

$$\begin{aligned} -\sum \limits _{j\in {\mathcal {J}}_{k}}\nabla f_{j}(x_{k-\tau _k^j})-\frac{1}{\alpha } \left( \nabla \omega (x_{k+1})-\nabla \omega (x_k)\right) \in \partial h(x_{k+1})+\sum \limits _{i\in {\mathcal {I}}_{k}}\nabla f_i (x_{k+1}). \end{aligned}$$
(33)

Using the subgradient inequality for the convex function \(h(x)+\sum \limits _{i\in {\mathcal {I}}_{k}}f_i (x)+\frac{\gamma _1+\gamma _2}{2}\Vert x\Vert ^2\) at \(x_{k+1}\), we have

$$\begin{aligned}&h(x_{k+1})+\sum \limits _{i\in {\mathcal {I}}_{k}}f_i (x_{k+1}) \nonumber \\ \le&h(x)+\sum \limits _{i\in {\mathcal {I}}_{k}}f_{i}(x)+ \left\langle \sum \limits _{j\in {\mathcal {J}}_{k}}\nabla f_{j}(x_{k-\tau _{k}^{j}})+\frac{1}{\alpha }\left( \nabla \omega (x_{k+1}) -\nabla \omega (x_k)\right) ,x-x_{k+1}\right\rangle \nonumber \\&+ \frac{\gamma _{1}+\gamma _{2}}{2}\Vert x-x_{k+1}\Vert ^2 \nonumber \\ =&h(x)+\sum \limits _{i\in {\mathcal {I}}_{k}}f_{i}(x)+ \left\langle \sum \limits _{j\in {\mathcal {J}}_{k}}\nabla f_{j}(x_{k-\tau _{k}^{j}}),x-x_{k+1}\right\rangle + \frac{\gamma _{1}+\gamma _{2}}{2}\Vert x-x_{k+1}\Vert ^2 \nonumber \\&+\frac{1}{\alpha }\langle \nabla \omega (x_{k+1})- \nabla \omega (x_k),x-x_{k+1}\rangle \nonumber \\ =&h(x)+\sum \limits _{i\in {\mathcal {I}}_{k}}f_{i}(x)+ \left\langle \sum \limits _{j\in {\mathcal {J}}_{k}}\nabla f_{j}(x_{k-\tau _{k}^{j}}),x-x_{k+1}\right\rangle + \frac{\gamma _{1}+\gamma _{2}}{2}\Vert x-x_{k+1}\Vert ^{2} \nonumber \\&+\frac{1}{\alpha }D_{\omega }(x,x_{k})- \frac{1}{\alpha }D_{\omega }(x,x_{k+1})- \frac{1}{\alpha }D_{\omega }(x_{k+1},x_{k}), \end{aligned}$$
(34)

where the last equality follows from the three-point identity of the Bregman distance. Adding (34) to (32), we get

$$\begin{aligned} \Phi (x_{k+1})&\le \Phi (x)+\frac{\gamma _{1}+\gamma _{2}}{2}\Vert x-x_{k+1}\Vert ^{2} +\frac{\gamma _{1}}{2}\sum \limits _{j\in {\mathcal {J}}_{k}}\Vert x-x_{k-\tau _{k}^{j}}\Vert ^{2}\nonumber \\&\quad +\frac{1}{\alpha }D_{\omega }(x,x_{k})- \frac{1}{\alpha }D_{\omega }(x,x_{k+1})- \frac{1}{\alpha }D_{\omega }(x_{k+1},x_{k})+ \sum \limits _{j\in {\mathcal {J}}_{k}}L_{j}\cdot D_{\omega }(x_{k+1},x_{k-\tau _{k}^{j}}). \end{aligned}$$
(35)

According to Assumption 1(d), we deduce

$$\begin{aligned} \sum _{i\in {\mathcal {J}}_{k}}L_{i}\cdot D_{\omega }(x_{k+1},x_{k-\tau _{k}^{i}})&\le \sum _{i\in {\mathcal {J}}_{k}}L_{i}\cdot \ell (\tau _k^i+1)\cdot \sum _{j=k-\tau _k^i}^k D_{\omega }(x_{j+1},x_j)\\&\le \sum _{i\in {\mathcal {J}}_{k}}L_{i}\cdot \ell (\tau +1)\cdot \sum _{j=k -\tau }^k D_{\omega }(x_{j+1},x_j)\\&=\ell (\tau +1)\cdot \sum _{i\in {\mathcal {J}}_k}L_i\sum _{j=k-\tau }^{k}D_{\omega }(x_{j+1},x_{j})\\&\le L\cdot \ell (\tau +1)\cdot \sum _{j=k-\tau }^{k}D_{\omega }(x_{j+1},x_{j}), \end{aligned}$$

where the second inequality is due to the fact that \(\tau _k^i\) is bounded above by \(\tau \) and the last inequality follows from \(L{:}{=}\sum _{j=1}^N L_j\). Together with the above inequality, (35) can be rewritten as

$$\begin{aligned} \Phi (x_{k+1})\le&\Phi (x)+\frac{\gamma _{1}+\gamma _{2}}{2}\Vert x-x_{k+1}\Vert ^{2} +\frac{\gamma _{1}}{2}\sum \limits _{j\in {\mathcal {J}}_{k}}\Vert x-x_{k-\tau _{k}^{j}}\Vert ^{2}\nonumber \\&+\frac{1}{\alpha }D_{\omega }(x,x_{k})- \frac{1}{\alpha }D_{\omega }(x,x_{k+1})- \frac{1}{\alpha }D_{\omega }(x_{k+1},x_{k})+ L\cdot \ell (\tau +1)\cdot \sum \limits _{j=k-\tau }^{k}D_{\omega }(x_{j+1},x_{j}). \end{aligned}$$
(36)

Invoking (36) with \(x=x_k\) implies

$$\begin{aligned} \begin{aligned} \Phi (x_{k+1}) \le&\Phi (x_k)+\frac{\gamma _{1}+\gamma _{2}}{2}\Vert x_k-x_{k+1}\Vert ^{2} +\frac{\gamma _{1}}{2}\sum \limits _{j\in {\mathcal {J}}_{k}}\Vert x_k-x_{k-\tau _{k}^{j}}\Vert ^{2}\\&-\frac{1}{\alpha }D_{\omega }(x_k,x_{k+1})- \frac{1}{\alpha }D_{\omega }(x_{k+1},x_{k})+ L\cdot \ell (\tau +1)\cdot \sum \limits _{j=k-\tau }^{k}D_{\omega }(x_{j+1},x_{j}). \end{aligned} \end{aligned}$$
(37)

The component \(\sum _{j=k-\tau }^{k}D_{\omega }(x_{j+1},x_{j})\) in (37) can be written as

$$\begin{aligned} \sum \limits _{j=k-\tau }^{k}D_{\omega }(x_{j+1},x_{j})=&(\tau +1)D_\omega (x_{k+1},x_k)+ \sum _{i=1}^{\tau }i D_{\omega }(x_{k-\tau +i},x_{k-\tau +i-1}) -\sum \limits _{i=1}^{\tau }i D_{\omega }(x_{k+1-\tau +i},x_{k-\tau +i}), \end{aligned}$$
(38)

and the component \(\Vert x_k-x_{k-\tau _{k}^{j}}\Vert ^{2}\) in (37) has the following inequality

$$\begin{aligned}&\left\| x_{k}-x_{k-\tau _{k}^{j}}\right\| ^{2}\le \tau _k^j\sum \limits _{j=k-\tau _k^j}^{k-1}\Vert x_{j+1}-x_{j}\Vert ^{2}\le \tau \sum \limits _{j=k-\tau }^{k}\Vert x_{j+1}-x_{j}\Vert ^{2}\nonumber \\ =&\tau (\tau +1)\Vert x_{k+1}-x_{k}\Vert ^{2} +\tau \sum \limits _{i=1}^{\tau }i\Vert x_{k-\tau +i}-x_{k-\tau +i-1}\Vert ^{2} -\tau \sum \limits _{i=1}^{\tau }i\Vert x_{k+1-\tau +i}-x_{k-\tau +i}\Vert ^{2}. \end{aligned}$$

Using the above inequaity, we have

$$\begin{aligned}&\sum _{j\in {\mathcal {J}}_{k}}\left\| x_{k}-x_{k-\tau _{k}^{j}}\right\| ^{2}\nonumber \\ \le&J_k\left[ \tau (\tau +1)\Vert x_{k+1}-x_{k}\Vert ^{2} +\tau \sum \limits _{i=1}^{\tau }i\Vert x_{k-\tau +i}-x_{k-\tau +i-1}\Vert ^{2} -\tau \sum \limits _{i=1}^{\tau }i\Vert x_{k+1-\tau +i}-x_{k-\tau +i}\Vert ^{2}\right] \nonumber \\ \le&N\left[ \tau (\tau +1)\Vert x_{k+1}-x_{k}\Vert ^{2} +\tau \sum \limits _{i=1}^{\tau }i\Vert x_{k-\tau +i}-x_{k-\tau +i-1}\Vert ^{2} -\tau \sum \limits _{i=1}^{\tau }i\Vert x_{k+1-\tau +i}-x_{k-\tau +i}\Vert ^{2}\right] . \end{aligned}$$
(39)

Plugging (38) and (39) into (37) and minusing \(\upsilon ({\mathcal {P}})\) on both sides of the inequality, we can get

$$\begin{aligned}&\Phi (x_{k+1})-\upsilon ({\mathcal {P}})+L\cdot \ell (\tau +1)\cdot \sum \limits _{i=1}^{\tau }i D_{\omega }(x_{k+1-\tau +i},x_{k-\tau +i}) +\frac{\gamma _{1}\tau N}{2}\sum \limits _{i=1}^{\tau }i\Vert x_{k+1-\tau +i}-x_{k-\tau +i}\Vert ^{2}\nonumber \\ \le&\Phi (x_{k})-\upsilon ({\mathcal {P}}) +L\cdot \ell (\tau +1)\cdot \sum \limits _{i=1}^{\tau }i D_{\omega }(x_{k-\tau +i},x_{k-\tau +i-1}) +\frac{\gamma _{1}\tau N}{2}\sum \limits _{i=1}^{\tau }i\Vert x_{k-\tau +i}-x_{k-\tau +i-1}\Vert ^{2}\nonumber \\&-\frac{1}{\alpha }D_{\omega }(x_{k},x_{k+1}) -\left[ \frac{1}{\alpha }-L\cdot \ell (\tau +1)\cdot (\tau +1)\right] D_{\omega }(x_{k+1},x_{k})\nonumber \\&+\frac{\gamma _1+\gamma _2}{2}\Vert x_{k}-x_{k+1}\Vert ^{2}+{\frac{\gamma _{1}\tau N}{2}(\tau +1)}\Vert x_{k+1}-x_{k}\Vert ^{2}. \end{aligned}$$
(40)

Recalling the definition of \(T_k\), \(C_1\) and \(C_2\), (40) can be rewritten as

$$\begin{aligned} T_{k+1}\le T_{k}-\frac{1}{\alpha }D_{\omega }(x_{k},x_{k+1}) -C_1D_{\omega }(x_{k+1},x_{k})+C_2\Vert x_{k+1}-x_{k}\Vert ^{2}, \end{aligned}$$

thus complete the proof. \(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jia, Z., Huang, J. & Cai, X. Proximal-like incremental aggregated gradient method with Bregman distance in weakly convex optimization problems. J Glob Optim 80, 841–864 (2021). https://doi.org/10.1007/s10898-021-01044-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10898-021-01044-9

Keywords

Navigation