Skip to main content
Log in

A Bregman-Style Improved ADMM and its Linearized Version in the Nonconvex Setting: Convergence and Rate Analyses

  • Published:
Journal of the Operations Research Society of China Aims and scope Submit manuscript

Abstract

This work explores a family of two-block nonconvex optimization problems subject to linear constraints. We first introduce a simple but universal Bregman-style improved alternating direction method of multipliers (ADMM) based on the iteration framework of ADMM and the Bregman distance. Then, we utilize the smooth performance of one of the components to develop a linearized version of it. Compared to the traditional ADMM, both proposed methods integrate a convex combination strategy into the multiplier update step. For each proposed method, we demonstrate the convergence of the entire iteration sequence to a unique critical point of the augmented Lagrangian function utilizing the powerful Kurdyka–Łojasiewicz property, and we also derive convergence rates for both the sequence of merit function values and the iteration sequence. Finally, some numerical results show that the proposed methods are effective and encouraging for the Lasso model.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  1. Yang, L.F., Luo, J.Y., Xu, Y., Zhang, Z.R., Dong, Z.Y.: A distributed dual consensus ADMM based on partition for DC-DOPF with carbon emission trading. IEEE Trans. Industr. Inform. 16(3), 1858–1872 (2020)

    Article  Google Scholar 

  2. Yang, L.F., Yang, Y., Chen, G., Dong, Z.Y.: Distributionally robust framework and its approximations based on vector and region split for self-scheduling of generation companies. IEEE Trans. Industr. Inform. 18(8), 5231–5241 (2022)

    Article  Google Scholar 

  3. Jian, J.B., Zhang, C., Yin, J.H., Yang, L.F., Ma, G.D.: Monotone splitting sequential quadratic optimization algorithm with applications in electric power systems. J. Optim. Theory Appl. 186, 226–247 (2020)

    Article  MathSciNet  Google Scholar 

  4. Fan, Y.R., Buccini, A., Donatelli, M., Huang, T.Z.: A non-convex regularization approach for compressive sensing. Adv. Comput. Math. 45, 563–588 (2019)

    Article  MathSciNet  Google Scholar 

  5. Zeng, J.S., Xu, Z.B., Zhang, B.C., Hong, W., Wu, Y.R.: Accelerated \(L_{{1}/{2}}\) regularization based SAR imaging via BCR and reduced Newton skills. Signal Process. 93, 1831–1844 (2013)

    Article  Google Scholar 

  6. Xu, Z.B., Chang, X.Y., Xu, F.M., Zhang, H.: \(L_{1/2}\) regularization: a thresholding representation theory and a fast solver. IEEE Trans. Neural Netw. Learn. Syst. 23(7), 1013–1027 (2012)

    Article  PubMed  Google Scholar 

  7. Liu, Z.Y., Chen, X.Y., Hu, J.T., Wang, S.A., Zhang, K., Zhang, H.G.: An alternating direction method of multipliers for solving user equilibrium problem. Eur. J. Oper. Res. 310(3), 1072–1084 (2023)

    Article  MathSciNet  Google Scholar 

  8. Shao, H., Lam, W.H.K., Tam, M.L.: A reliability-based stochastic traffic assignment model for network with multiple user classes under uncertainty in demand. Netw. Spat. Econ. 6(3–4), 173–204 (2006)

    Article  MathSciNet  Google Scholar 

  9. Glowinski, R., Marroco, A.: Sur l’approximation, par éléments finis d’ordre un, et la résolution, par pénalisation-dualité d’une classe de problèmes de dirichlet non linéaires, Revue française d’automatique, informatique, recherche opérationnelle. Anal. Num. 9(R2), 41–76 (1975)

    Google Scholar 

  10. Gabay, D., Mercier, B.: A dual algorithm for the solution of nonlinear variational problems via finite element approximation. Comput. Math. Appl. 2(1), 17–40 (1976)

    Article  Google Scholar 

  11. Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J.: Distributed optimization and statistical learning with the alternating direction method of multipliers. Found. Trends Mach. Learn. 3(1), 1–122 (2011)

    Article  Google Scholar 

  12. Glowinski, R.: On Alternating Direction Methods of Multipliers: A Historical Perspective. Springer, Dordrecht (2014)

    Google Scholar 

  13. Han, D.R.: A survey on some recent developments of alternating direction method of multipliers. J. Oper. Res. Soc. China 10, 1–52 (2022)

    Article  MathSciNet  Google Scholar 

  14. Han, D.R., Sun, D.F., Zhang, L.W.: Linear rate convergence of the alternating direction method of multipliers for convex composite programming. Math. Oper. Res. 43(2), 622–637 (2017)

    Article  MathSciNet  Google Scholar 

  15. Han, D.R., Yuan, X.M.: A note on the alternating direction method of multipliers. J. Optim. Theory Appl. 155, 227–238 (2012)

    Article  MathSciNet  Google Scholar 

  16. Chen, C.H., Chan, R.H., Ma, S.Q., Yang, J.F.: Inertial proximal ADMM for linearly constrained separable convex optimization. SIAM J. Imaging Sci. 8(4), 2239–2267 (2015)

    Article  MathSciNet  Google Scholar 

  17. Chen, C.H., Li, M., Yuan, X.M.: Further study on the convergence rate of alternating direction method of multipliers with logarithmic-quadratic proximal regularization. J. Optim. Theory Appl. 166, 906–929 (2015)

    Article  MathSciNet  Google Scholar 

  18. Chen, C.H., Li, M., Liu, X., Ye, Y.Y.: Extended ADMM and BCD for nonseparable convex minimization models with quadratic coupling terms: Convergence analysis and insights. Math. Program. 173, 37–77 (2019)

    Article  MathSciNet  Google Scholar 

  19. Li, P.X., Shen, Y., Jiang, S.H., Liu, Z.H., Chen, C.H.: Convergence study on strictly contractive Peaceman-Rachford splitting method for nonseparable convex minimization models with quadratic coupling terms. Comput. Optim. Appl. 78, 87–124 (2021)

    Article  MathSciNet  Google Scholar 

  20. Deng, W., Yin, W.T.: On the global and linear convergence of the generalized alternating direction method of multipliers. J. Sci. Comput. 66, 889–916 (2015)

    Article  MathSciNet  Google Scholar 

  21. Hager, W.W., Yashtini, M., Zhang, H.: An \(O(1/k)\) convergence rate for the variable Stepsize Bregman operator splitting algorithm. SIAM J. Numer. Anal. 53, 1535–1556 (2016)

    Article  MathSciNet  Google Scholar 

  22. Hong, M.Y., Luo, Z.Q.: On the linear convergence of the alternating direction method of multipliers. Math. Program. 162, 165–199 (2017)

    Article  MathSciNet  Google Scholar 

  23. Gu, Y., Jiang, B., Han, D.R.: An indefinite-proximal-based strictly contractive Peaceman-Rachford splitting method. J. Comput. Math. 41, 1017–1040 (2022)

    Article  MathSciNet  Google Scholar 

  24. Lin, T.Y., Ma, S.Q., Zhang, S.Z.: On the sublinear convergence rate of multi-block ADMM. J. Oper. Res. Soc. China 3, 251–274 (2015)

    Article  MathSciNet  Google Scholar 

  25. Wu, Z.M., Li, M.: An LQP-based symmetric alternating direction method of multipliers with larger step sizes. J. Oper. Res. Soc. China 7, 365–383 (2019)

    Article  MathSciNet  Google Scholar 

  26. Gao, X., Zhang, S.Z.: First-order algorithms for convex optimization with nonseparable objective and coupled constraints. J. Oper. Res. Soc. China 5, 131–159 (2017)

    Article  MathSciNet  Google Scholar 

  27. Gao, X., Zhang, S.Z., Xu, Y.Y.: Randomized primal-dual proximal block coordinate updates. J. Oper. Res. Soc. China 7, 205–250 (2019)

    Article  MathSciNet  Google Scholar 

  28. Lin, T.Y., Ma, S.Q., Zhang, S.Z.: On the global linear convergence of the ADMM with multiblock variables. SIAM J. Optim. 25(3), 1249–1963 (2015)

    Article  MathSciNet  Google Scholar 

  29. Ling, Q., Shi, W., Wu, G., Ribeiro, A.: DLM: Decentralized linearized alternating direction method of multipliers. IEEE Trans. Signal Process. 63(15), 4051–4064 (2015)

    Article  MathSciNet  Google Scholar 

  30. Ouyang, Y.Y., Chen, Y.M., Lan, G.H., Pasiliao, E., Jr.: An accelerated linearized alternating direction method of multipliers. SIAM J. Imaging Sci. 8(1), 644–681 (2015)

    Article  MathSciNet  Google Scholar 

  31. Qiao, L.B., Zhang, B.F., Su, J.S., Lu, X.C.: Linearized alternating direction method of multipliers for constrained nonconvex regularized optimization. Proc. 8th Asian Conf. Mach. Learn. 63, 97–109 (2016)

    Google Scholar 

  32. Guo, K., Han, D.R., Wu, T.T.: Convergence of alternating direction method for minimizing sum of two nonconvex functions with linear constraints. Int. J. Comput. Math. 94(8), 1653–1669 (2017)

    Article  MathSciNet  Google Scholar 

  33. Yashtini, M.: Convergence and rate analysis of a proximal linearized ADMM for nonconvex nonsmooth optimization. J. Glob. Optim. 84, 913–939 (2022)

    Article  MathSciNet  Google Scholar 

  34. Liu, P.J., Jian, J.B., Xu, J.W., Ma, G.D.: A linear approximation Bregman-type Peaceman-Rachford splitting method for nonconvex nonseparable optimization (in Chinese). Acta. Math. Sin. 66(01), 75–94 (2023)

    Google Scholar 

  35. Jia, Z.H., Gao, X., Cai, X.J., Han, D.R.: Local linear convergence of the alternating direction method of multipliers for nonconvex separable optimization problems. J. Optim. Theory Appl. 188, 1–25 (2021)

    Article  MathSciNet  Google Scholar 

  36. Jia, Z.H., Gao, X., Cai, X.J., Han, D.R.: The convergence rate analysis of the symmetric ADMM for the nonconvex separable optimization problems. J. Ind. Manag. Optim. 17(4), 1943–1971 (2021)

    Article  MathSciNet  Google Scholar 

  37. Bartz, S., Campoy, R., Phan, H.M.: An adaptive alternating direction method of multipliers. J. Optim. Theory Appl. 195, 1019–1055 (2022)

    Article  MathSciNet  Google Scholar 

  38. Dao, M.N., Phan, H.M.: Adaptive Douglas-Rachford splitting algorithm for the sum of two operators. SIAM J. Optim. 29(4), 2697–2724 (2019)

    Article  MathSciNet  Google Scholar 

  39. Bartz, S., Dao, M.N., Phan, H.M.: Conical averagedness and convergence analysis of fixed point algorithms. J. Glob. Optim. 82(2), 351–373 (2022)

    Article  MathSciNet  Google Scholar 

  40. Li, G.Y., Pong, T.K.: Global convergence of splitting methods for nonconvex composite optimization. SIAM J. Optim. 25(4), 2434–2460 (2015)

    Article  MathSciNet  Google Scholar 

  41. Liu, P.J., Jian, J.B., He, B., Jiang, X.Z.: Convergence of Bregman Peaceman-Rachford splitting method for nonconvex nonseparable optimization. J. Oper. Res. Soc. China 11(4), 707–733 (2022)

    Article  MathSciNet  Google Scholar 

  42. Wang, F.H., Cao, W.F., Xu, Z.B.: Convergence of multi-block Bregman ADMM for nonconvex composite problems. Sci. China Inf. Sci. 61(12), 122101 (2018)

    Article  MathSciNet  Google Scholar 

  43. Wang, F.H., Xu, Z.B., Xu, H.K.: Convergence of Bregman Alternating Direction Method with Multipliers for Nonconvex Composite Problems. arXiv:1410.8625 (2014)

  44. Liu, P.J., Jian, J.B., Ma, G.D.: A Bregman-style partially symmetric alternating direction method of multipliers for nonconvex multi-block optimization. Acta Math. Appl. Sin. Eng. Ser. 39(2), 354–380 (2023)

    Article  MathSciNet  Google Scholar 

  45. Xu, J.W., Chao, M.T.: An inertial Bregman generalized alternating direction method of multipliers for nonconvex optimization. J. Appl. Math. Comput. 68, 1–27 (2022)

    Article  MathSciNet  Google Scholar 

  46. Jian, J.B., Ma, G.D., Liu, P.J., Xu, J.W.: Convergence analysis of an improved Bregman-type Peaceman-Rachford splitting algorithm for nonconvex nonseparable linearly constrained optimization problems. J. Comput. Appl. Math. 426, 115086 (2023)

    Article  MathSciNet  Google Scholar 

  47. Bot, R.I., Nguyen, D.K.: The proximal alternating direction method of multipliers in the nonconvex setting: convergence analysis and rates. Math. Oper. Res. 45(2), 682–712 (2020)

    Article  MathSciNet  Google Scholar 

  48. Wu, Z.M., Li, M., Wang, D.Z.W., Han, D.R.: A symmetric alternating direction method of multipliers for separable nonconvex minimization problems. Asia Pac. J. Oper. Res. 34(6), 1750030 (2017)

    Article  MathSciNet  Google Scholar 

  49. Themelis, A., Patrinos, P.: Douglas-Rachford splitting and ADMM for nonconvex optimization: Tight convergence results. SIAM J. Optim. 30(1), 149–181 (2020)

    Article  MathSciNet  Google Scholar 

  50. Goncalves, M.L.N., Melo, J.G., Monteiro, R.D.C.: Convergence rate bounds for a proximal ADMM with over-relaxation stepsize parameter for solving nonconvex linearly constrained problems. Pac. J. Optim. 15(3), 379–398 (2019)

    MathSciNet  Google Scholar 

  51. Yashtini, M.: Multi-block nonconvex nonsmooth proximal ADMM: Convergence and rates under Kurdyka-Łojasiewicz property. J. Optim. Theory Appl. 190, 966–998 (2021)

    Article  MathSciNet  Google Scholar 

  52. Liu, P.J., Shao, H., Wang, Y., Wu, X.Y.: Local linear convergrnce rate analysis of a symmetric ADMM with relaxation-step for nonconvex optimization. J. Sys. Sci. Math. Sci. 43(1), 78–93 (2023) (in Chinese)

  53. Bai, J.C., Guo, K., Liang, J.L., Jing, Y., So, H.C.: Accelerated symmetric ADMM and its applications in large-scale signal processing. J. Comput. Math. (2023). https://doi.org/10.4208/jcm.2305-m2021-0107

  54. Barzilai, J., Borwein, J.M.: Two point step size gradient method. IMA J. Numer. Anal. 8, 141–148 (1988)

    Article  MathSciNet  Google Scholar 

  55. Rockafellar, R., Wets, R.: Variational Analysis. Springer-Verlag, Berlin Heidelberg (1998)

    Book  Google Scholar 

  56. Nesterov, Y.: Introduction Lectures on Convex Optimization: A Basic Course. Springer Science & Business Media, Berlin (2013)

    Google Scholar 

  57. Attouch, H., Bolte, J., Redont, P., Soubeyran, A.: Proximal alternating minimization and projection methods for nonconvex problems: An approach based on the Kurdyka-Łojasiewicz inequality. Math. Oper. Res. 35, 438–457 (2010)

    Article  MathSciNet  Google Scholar 

  58. Attouch, H., Bolte, J., Svaiter, B.F.: Convergence of descent methods for semi-algebraic and tame problems: Proximal algorithms, forward-backward splitting, and regularized Gauss-Seidel methods. Math. Program. 137(1–2), 91–129 (2013)

    Article  MathSciNet  Google Scholar 

  59. Bolte, J., Sabach, S., Teboulle, M.: Proximal alternating linearized minimization or nonconvex and nonsmooth problems. Math. Program. 146(1–2), 459–494 (2014)

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Contributions

All authors read and approved the final manuscript. P.-J. Liu mainly contributed to the algorithm design, convergence analysis, numerical results and drafted the manuscript; J.-B. Jian and H. Shao mainly contributed to the algorithm design; X.-Q. Wang mainly contributed to the numerical results; J.-W. Xu and X.-Y. Wu mainly contributed to the convergence analysis.

Corresponding author

Correspondence to Jin-Bao Jian.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

This work was supported by the National Natural Science Foundation of China (Nos. 12171106 and 72071202), the Natural Science Foundation of Guangxi Province (No. 2020GXNSFDA238017) and Key Laboratory of Mathematics and Engineering Applications, Ministry of Education.

Appendix

Appendix

Proof

(i) Firstly, from the definition of \(\mathcal {L_\beta (\cdot )}\) and (5c), one has

$$\begin{aligned} \mathcal {L}_\beta (w^{k+1})-\mathcal {L}_\beta (x^{k+1},y^{k+1},\lambda ^{k})= & {} (\lambda ^{k}-\lambda ^{k+1})^{\top } (Ax^{k+1}+y^{k+1}-b)\nonumber \\= & {} \frac{1}{ \gamma \beta + (1- \gamma ) \alpha }\Vert \lambda ^{k+1}-\lambda ^{k}\Vert ^{2}. \end{aligned}$$
(45)

Secondly, the expression \(\mathcal {L}_{\beta }(\cdot )\) (2) provides us with

$$\begin{aligned}{} & {} \quad \mathcal {L}_{\beta }(x^{k+1}, y^{k+1},\lambda ^{k})-\mathcal {L}_{\beta }(x^{k+1}, y^{k},\lambda ^{k}) \nonumber \\{} & {} = g(y^{k+1})- g(y^{k}) - \langle \lambda ^{k}, y^{k+1}- y^{k}\rangle +\frac{\beta }{2} \Vert A x^{k+1}+ y^{k+1}-b\Vert ^{2} \nonumber \\{} & {} \quad - \frac{\beta }{2} \Vert A x^{k+1}+y^{k}-b\Vert ^{2} \nonumber \\{} & {} = g(y^{k+1})- g(y^{k}) - \langle \lambda ^{k} - \beta (A x^{k+1}+ y^{k+1}-b), y^{k+1}- y^{k}\rangle \nonumber \\{} & {} \quad - \frac{\beta }{2} \Vert y^{k+1}-y^{k}\Vert ^{2}. \end{aligned}$$
(46)

On the other hand, by the \(l_{g}-\)Lipschitz continuity of \(\nabla g\) and Lemma 1, we have

$$\begin{aligned} g(y^{k+1})-g(y^{k})- \nabla g(y^{k})^{\top } (y^{k+1}-y^{k}) \leqslant \frac{l_{g}}{2} \Vert y^{k+1}-y^{k}\Vert ^{2}. \end{aligned}$$
(47)

Therefore, by combining the above relations (46) and (47), along with (39b) and (5c), we conclude that

$$\begin{aligned}{} & {} \quad \mathcal {L}_{\beta }(x^{k+1}, y^{k+1},\lambda ^{k})-\mathcal {L}_{\beta }(x^{k+1}, y^{k},\lambda ^{k}) \nonumber \\{} & {} \leqslant - \frac{\beta - l_{g}}{2} \Vert y^{k+1}-y^{k}\Vert ^{2} + \langle \nabla g(y^{k}) -\lambda ^{k} + \beta (A x^{k+1}+ y^{k+1}-b), y^{k+1}- y^{k}\rangle \nonumber \\{} & {} = - \frac{\beta - l_{g}}{2} \Vert y^{k+1}-y^{k}\Vert ^{2} + \langle (\beta -\alpha ) (A x^{k+1}+ y^{k+1}-b) \nonumber \\{} & {} \quad - \tau _k (y^{k+1}-y^{k}), y^{k+1}- y^{k}\rangle \nonumber \\{} & {} = - \frac{\beta - l_{g}}{2} \Vert y^{k+1}-y^{k}\Vert ^{2} + \frac{\alpha -\beta }{ \gamma \beta + (1- \gamma ) \alpha } \langle \lambda ^{k+1}- \lambda ^{k}, y^{k+1}- y^{k}\rangle \nonumber \\{} & {} \quad - \tau _k \Vert y^{k+1}-y^{k}\Vert ^2 \nonumber \\{} & {} \leqslant - \left( \frac{ 2\tau _k+ \beta - l_{g}}{2} - \frac{|\alpha -\beta |}{2 [\gamma \beta + (1- \gamma ) \alpha ]}\right) \Vert y^{k+1}-y^{k}\Vert ^{2} \nonumber \\{} & {} \quad + \frac{|\alpha -\beta |}{2 [\gamma \beta + (1- \gamma ) \alpha ]} \Vert \lambda ^{k+1}- \lambda ^{k}\Vert ^{2}. \end{aligned}$$
(48)

Thirdly, since \(x^{k+1}\) is a minimizer of the \(x-\)subproblem (5a), then using the strong convexity of \(\triangle _{\psi }\), we obtain

$$\begin{aligned} \mathcal {L}_\beta (x^{k+1},y^{k},\lambda ^{k})-\mathcal {L}_\beta (w^{k})\leqslant -\triangle _{\psi }(x^{k+1},x^{k}) \leqslant -\frac{\sigma _{\psi }}{2}\Vert x^{k+1}-x^{k}\Vert ^{2}. \end{aligned}$$
(49)

Summing up relations (45), (48) and (49), it follows that

$$\begin{aligned}{} & {} \quad \mathcal {L}_\beta (w^{k+1})-\mathcal {L}_\beta (w^{k})\nonumber \\{} & {} \leqslant -\frac{\sigma _{\psi }}{2}\Vert x^{k+1}-x^{k}\Vert ^{2}-\left( \frac{2\tau _k+ \beta - l_{g}}{2} - \frac{|\alpha -\beta |}{2 [\gamma \beta + (1- \gamma ) \alpha ]}\right) \Vert y^{k+1}-y^{k}\Vert ^{2} \nonumber \\{} & {} \quad +\frac{2 + |\alpha -\beta |}{2 [\gamma \beta + (1- \gamma ) \alpha ]}\Vert \lambda ^{k+1}-\lambda ^{k}\Vert ^{2}. \end{aligned}$$
(50)

By recalling (5c) and the optimality condition (39b), we have

$$\begin{aligned} \lambda ^{k+1}=\nabla g(y^{k}) + \tau _k (y^{k+1}-y^{k}) + \gamma (\alpha - \beta ) (Ax^{k+1}+y^{k+1}-b), \end{aligned}$$
(51)

and using Assumption 2 (i), one has

$$\begin{aligned}{} & {} \quad \Vert \lambda ^{k+1}-\lambda ^{k}\Vert \nonumber \\{} & {} \leqslant (\tau _k + \gamma |\alpha - \beta |) \Vert y^{k+1}-y^{k}\Vert + (\tau _k+ l_{g}) \Vert y^{k}-y^{k-1}\Vert \nonumber \\{} & {} \quad + \gamma |\alpha - \beta | \Vert A(x^{k+1} - x^{k})\Vert , \end{aligned}$$
(52)

which further implies

$$\begin{aligned}{} & {} \quad \Vert \lambda ^{k+1}-\lambda ^{k}\Vert ^2 \\{} & {} \leqslant 3 (\tau _k + \gamma |\alpha - \beta |)^2 \Vert y^{k+1}-y^{k}\Vert ^2 + 3(\tau _k+ l_{g})^2 \Vert y^{k}-y^{k-1}\Vert ^2 \\{} & {} \quad + 3\gamma ^2 (\alpha - \beta )^2 \lambda _{\max }(A^{\top }A) \Vert x^{k+1} - x^{k}\Vert ^2. \end{aligned}$$

Finally, from the relation above, (50) and \(0 < \tau _1 \leqslant \tau _k \leqslant \tau _2\), we obtain

$$\begin{aligned}{} & {} \quad \mathcal {L}_\beta (w^{k+1})-\mathcal {L}_\beta (w^{k})\\{} & {} \leqslant -\left( \frac{\sigma _{\psi }}{2} - \gamma ^2 (\alpha - \beta )^2 \lambda _{\max }(A^{\top }A) \cdot \frac{6 + 3 |\alpha -\beta |}{2 [\gamma \beta + (1- \gamma ) \alpha ]} \right) \Vert x^{k+1}-x^{k}\Vert ^{2} \\{} & {} \quad - \left( \frac{2\tau _1+ \beta - l_{g}}{2} - \frac{|\alpha -\beta |}{2 [\gamma \beta + (1- \gamma ) \alpha ]} - (\tau _2 + \gamma |\alpha - \beta |)^2 \cdot \right. \\{} & {} \quad \left. \frac{6 + 3 |\alpha -\beta |}{2 [\gamma \beta + (1- \gamma ) \alpha ]} \right) \Vert y^{k+1}-y^{k}\Vert ^{2} \\{} & {} \quad +(\tau _2+ l_{g})^2 \cdot \frac{6 + 3 |\alpha -\beta |}{2 [\gamma \beta + (1- \gamma ) \alpha ]} \Vert y^{k}-y^{k-1}\Vert ^{2}, \end{aligned}$$

and letting \(\varrho =(\tau _2+ l_{g})^2 \cdot \frac{6 + 3 |\alpha -\beta |}{2 [\gamma \beta + (1- \gamma ) \alpha ]}\), then

$$\begin{aligned}{} & {} \quad \left[ \mathcal {L}_\beta (w^{k+1})+\varrho \Vert y^{k+1}-y^{k}\Vert ^{2}\right] -\left[ \mathcal {L}_\beta (w^{k})+\varrho \Vert y^{k}-y^{k-1}\Vert ^{2}\right] \\{} & {} \leqslant -\left( \frac{\sigma _{\psi }}{2} - \gamma ^2 (\alpha - \beta )^2 \lambda _{\max }(A^{\top }A) \cdot \frac{6 + 3 |\alpha -\beta |}{2 [\gamma \beta + (1- \gamma ) \alpha ]} \right) \Vert x^{k+1}-x^{k}\Vert ^{2} \\{} & {} \quad - \left( \frac{ 2\tau _1+ \beta - l_{g}}{2} - \frac{|\alpha -\beta |}{2 [\gamma \beta + (1- \gamma ) \alpha ]} - (\tau _2 + \gamma |\alpha - \beta |)^2 \cdot \right. \\{} & {} \quad \left. \frac{6 + 3 |\alpha -\beta |}{2 [\gamma \beta + (1- \gamma ) \alpha ]} \right) \Vert y^{k+1}-y^{k}\Vert ^{2} \\{} & {} \quad +(\tau _2+ l_{g})^2 \cdot \frac{6 + 3 |\alpha -\beta |}{2 [\gamma \beta + (1- \gamma ) \alpha ]} \Vert y^{k+1}-y^{k}\Vert ^{2}. \end{aligned}$$

Hence,

$$\begin{aligned} \hat{\mathcal {L}}_\beta (\hat{w}^{k+1})\leqslant \hat{\mathcal {L}}_\beta (\hat{w}^{k})-\delta (\Vert x^{k+1}-x^{k}\Vert ^2+\Vert y^{k+1}-y^{k}\Vert ^2). \end{aligned}$$

From Assumption 2 (ii) and (iii), we have \(\delta >0\). This together with relation above further indicates that \(\{\hat{\mathcal {L}}_\beta (\hat{w}^{k})\}\) is monotonically decreasing.

(ii) Since the cluster point set of \(\{w^{k}\}\) is nonempty, we obtain that the sequence \(\{\hat{w}^{k}\}\) has at least one cluster point. Let \(\hat{w}^{*}\) be a cluster point of {\(\hat{w}^{k}\)} and let \(\{\hat{w}^{k_{j}}\}\) be the subsequence converging to it, i.e., \(\lim \limits _{j \rightarrow +\infty }\hat{w}^{k_{j}}= \hat{w}^{*}.\) To proceed, similar to the proof of Lemma 4 (ii), the desired conclusion holds.

Proof

  1. (i)

    By the definitions of \(\varOmega \) and \(\hat{\varOmega }\), they are trivial.

  2. (ii)

    Combining Lemma 5 (ii) and the definition of \(w^{k}\) and \(\hat{w}^{k}\), we easily get the result.

  3. (iii)

    Let \(w^{*}\in \varOmega \). Then, there exists a subsequence \(\{w^{k_{j}}\}\) of \(\{w^{k}\}\) converging to \(w^{*}\). Lemma 5 (ii) implies that \(\lim \limits _{j \rightarrow +\infty }w^{k_{j}+1} = \lim \limits _{j \rightarrow +\infty }w^{k_{j}} =w^*\). By taking limit \(j \rightarrow +\infty \) in (5c), we have \(\lambda ^{*}=\lambda ^{*}- [\gamma \beta + (1- \gamma ) \alpha ](Ax^*+y^*-b)\) with \([\gamma \beta + (1- \gamma ) \alpha ] >0\), which implies

    $$\begin{aligned} Ax^*+y^*-b=0. \end{aligned}$$
    (53)

    Thus, \((x^{*}, y^{*})\) is a feasible point of (1). In view of \(x^{k_{j}+1}\) is a minimizer of the \(x-\)subproblem (5a), it holds that

    $$\begin{aligned}{} & {} f(x^{k_{j}+1})-\left\langle \lambda ^{k_{j}}, A x^{k_{j}+1}\right\rangle +\frac{\beta }{2}\left\| A x^{k_{j}+1}+y^{k_{j}}-b\right\| ^{2}+\triangle _{\psi }(x^{k_{j}+1},x^{k_{j}}) \\{} & {} \quad \leqslant f(x^{*})-\left\langle \lambda ^{k_{j}}, A x^{*}\right\rangle +\frac{\beta }{2}\left\| A x^{*}+y^{k_{j}}-b\right\| ^{2}+\triangle _{\psi }(x^{*},x^{k_{j}}). \end{aligned}$$

    It is equivalent to

    $$\begin{aligned} \mathcal {L}_\beta (x^{k_{j}+1},y^{k_{j}},\lambda ^{k_{j}})+\triangle _{\psi }(x^{k_{j}+1},x^{k_{j}}) \leqslant \mathcal {L}_\beta (x^{*},y^{k_{j}},\lambda ^{k_{j}})+\triangle _{\psi }(x^{*},x^{k_{j}}), \end{aligned}$$

    which implies

    $$\begin{aligned}{} & {} \quad \hat{\mathcal {L}}_\beta (x^{k_{j}+1},y^{k_{j}},\lambda ^{k_{j}},y^{k_{j}-1})- \varrho \Vert y^{k_{j}}-y^{k_{j}-1}\Vert ^2\\{} & {} \leqslant \hat{\mathcal {L}}_\beta (x^{*},y^{k_{j}},\lambda ^{k_{j}},y^{k_{j}-1})-\triangle _{\psi }(x^{k_{j}+1},x^{k_{j}})+\triangle _{\psi }(x^{*},x^{k_{j}})\\{} & {} \quad -\varrho \Vert y^{k_{j}}-y^{k_{j}-1}\Vert ^2\\{} & {} \leqslant \hat{\mathcal {L}}_\beta (x^{*},y^{k_{j}},\lambda ^{k_{j}},y^{k_{j}-1})+\triangle _{\psi }(x^{*},x^{k_{j}})-\varrho \Vert y^{k_{j}}-y^{k_{j}-1}\Vert ^2. \end{aligned}$$

    Taking limit along the convergent subsequence and using the continuity of \({\hat{\mathcal {L}}_\beta (\cdot )}\) with respect to \((y, \lambda , \hat{y}),\) we have

    $$\begin{aligned} \limsup _{j \rightarrow +\infty }\hat{\mathcal {L}}_\beta (\hat{w}^{k_{j}+1})=\limsup _{j\rightarrow +\infty }\hat{\mathcal {L}}_\beta (x^{k_{j}+1},y^{k_{j}},\lambda ^{k_{j}},y^{k_{j}-1})\leqslant \hat{\mathcal {L}}_\beta (\hat{w}^{*}). \end{aligned}$$

    This, together with lower semicontinuity of \({\mathcal {\hat{L}}_\beta (\cdot )}\), yields that \(\lim \limits _{j \rightarrow +\infty }\hat{\mathcal {L}}_\beta (\hat{w}^{k_{j}+1})\) \(=\hat{\mathcal {L}}_\beta (\hat{w}^{*}).\) Along with the monotonity of \(\{\hat{\mathcal {L}}_\beta (\hat{w}^{k})\}\), further shows that the whole sequence \(\{\hat{\mathcal {L}}_\beta (\hat{w}^{k})\}\) is convergent. Therefore, in views of \(\hat{\mathcal {L}}_\beta (\hat{w}^{k}) \leqslant \hat{\mathcal {L}}_\beta (\hat{w}^{0})< +\infty \), we have

    $$\begin{aligned} +\infty > \hat{\mathcal {L}}_\beta (\hat{w}^{0}) \geqslant \lim \limits _{k\rightarrow +\infty }\hat{\mathcal {L}}_\beta (\hat{w}^{k})=\inf \limits _{k}\hat{\mathcal {L}}_\beta (\hat{w}^{k})= \hat{\mathcal {L}}_\beta (\hat{w}^{*}). \end{aligned}$$

    Hence, \(\hat{\mathcal {L}}_{\beta }(\hat{w}^{*}) \equiv \lim \limits _{k\rightarrow + \infty } \hat{\mathcal {L}}_{\beta }(\hat{w}^{k}) < +\infty \) for all \(\hat{w}^{*} \in \hat{\varOmega }\).

  4. (iv)

    Similar to the proof analysis of Theorem 1 (iii), the desired result holds.

Proof

From Theorem 5 (iii), it follows that \(\lim \limits _{k\rightarrow +\infty }\hat{\mathcal {L}}_\beta (\hat{w}^{k})\) \(= \hat{\mathcal {L}}_\beta (\hat{w}^{*})\) for all \(\hat{w}^{*}\in \hat{\varOmega }\). We consider two cases as follows:

  1. (A)

    If there exists an integer \(k_{0}\) such that \(\hat{\mathcal {L}}_\beta (\hat{w}^{k_{0}})= \hat{\mathcal {L}}_\beta (\hat{w}^{*})\). From Lemma 5 (i), for any \(k \geqslant k_{0}\), we have

    $$\begin{aligned}{} & {} \delta (\Vert x^{k+1}-x^{k}\Vert ^{2}+\Vert y^{k+1}-y^{k}\Vert ^{2})\leqslant \hat{\mathcal {L}}_\beta (\hat{w}^{k})- \hat{\mathcal {L}}_\beta (\hat{w}^{k+1})\leqslant \hat{\mathcal {L}}_\beta (\hat{w}^{k_{0}})\\{} & {} \quad - \hat{\mathcal {L}}_\beta (\hat{w}^{*})=0, \end{aligned}$$

    thus, \(x^{k+1}=x^{k}\) and \(y^{k+1}=y^{k}\) for all \(k \geqslant k_{0}\). Associated with (52), for any \(k \geqslant k_{0}+1\), it follows that \(w^{k+1}=w^{k}\) and the assertion holds.

  2. (B)

    Let us assume \(\hat{\mathcal {L}}_\beta (\hat{w}^{k})> \hat{\mathcal {L}}_\beta (\hat{w}^{*})\) for all k. From \(\lim \limits _{k\rightarrow +\infty }d(\hat{w}^{k},\hat{\varOmega })=0\), we infer that for \(\varepsilon >0\), there exists \(k_{1}>0\), such that \(d(\hat{w}^{k},\hat{\varOmega })<\varepsilon \) for any \(k>k_{1}\). Again, it follows from \(\lim \limits _{k\rightarrow +\infty }\hat{\mathcal {L}}_\beta (\hat{w}^{k})= \hat{\mathcal {L}}_\beta (\hat{w}^{*})\) that for \(\eta >0\), there exists \(k_{2}>0\), such that \(\hat{\mathcal {L}}_\beta (\hat{w}^{k})< \hat{\mathcal {L}}_\beta (\hat{w}^{*})+\eta \) for all \(k>k_{2}\). Consequently, for \(\varepsilon ,\eta >0\), when \(k>\tilde{k}=\max \{k_{1},k_{2}\},\) we have

    $$\begin{aligned} d(\hat{w}^{k},\hat{\varOmega })<\varepsilon ,~~\hat{\mathcal {L}}_\beta (\hat{w}^{*})<\hat{\mathcal {L}}_\beta (\hat{w}^{k})< \hat{\mathcal {L}}_\beta (\hat{w}^{*})+\eta . \end{aligned}$$

    From Theorem 5, we know that \(\hat{\mathcal {L}}_\beta (\cdot )\) is constant on \(\hat{\varOmega }\), and \(\hat{\varOmega }\) is a nonempty compact set. Thus, it follows from Lemma 3 that \(\mathcal {\hat{L}}_\beta (\cdot )\) satisfies the uniformized KŁ property, further implies

    $$\begin{aligned} \varphi '(\hat{\mathcal {L}}_\beta (\hat{w}^{k})-\hat{\mathcal {L}}_\beta (\hat{w}^{*}))d(0,\partial \hat{\mathcal {L}}_\beta (\hat{w}^{k}))\geqslant 1,\ \forall \ k>\tilde{k}. \end{aligned}$$
    (54)

Next, we attempt to provide upper estimate in terms of the iterates for limiting subdifferential of \(\hat{\mathcal {L}}_\beta (\hat{w}^{k})\). Taking the subdifferential of \(\hat{\mathcal {L}}_\beta (\cdot )\) at \(\hat{w}^{k}\) with respect to x, one has

$$\begin{aligned} \partial _{x}\mathcal {\hat{L}}_\beta (\hat{w}^{k})= & {} \partial f(x^{k})-A^{\top } \lambda ^{k}+\beta A^{\top } (Ax^{k}+y^{k}-b) \nonumber \\&\ni&A^{\top } (\lambda ^{k-1}-\lambda ^{k})+\beta A^{\top } (y^{k}-y^{k-1})-[\nabla \psi (x^{k})-\nabla \psi (x^{k-1})],\nonumber \\ \end{aligned}$$
(55)

where the second relation follows from the optimality condition (39a). Similarly, we obtain for y that

$$\begin{aligned} \partial _{y}\mathcal {\hat{L}}_\beta (\hat{w}^{k})= & {} \nabla g(y^{k})-\lambda ^{k}+\beta (Ax^{k}+y^{k}-b) +2 \varrho (y^{k}-y^{k-1}) \nonumber \\= & {} \nabla g(y^{k})-\nabla g(y^{k-1}) + (2 \varrho -\tau _{k})(y^{k}-y^{k-1}) \nonumber \\{} & {} + [\beta - \gamma (\alpha - \beta )] (Ax^{k}+y^{k}-b) \nonumber \\= & {} \nabla g(y^{k})-\nabla g(y^{k-1}) + (2 \varrho -\tau _{k})(y^{k}-y^{k-1}) \nonumber \\{} & {} + \frac{\gamma (\alpha - \beta ) - \beta }{\gamma \beta + (1- \gamma ) \alpha } (\lambda ^{k}-\lambda ^{k-1}), \end{aligned}$$
(56)

where the second equality follows from (51) and the final equality utilizes (5c). In addition,

$$\begin{aligned} \partial _{\lambda }\mathcal {\hat{L}}_\beta (\hat{w}^{k})=-(Ax^{k}+y^{k}-b)= \frac{1}{\gamma \beta + (1- \gamma ) \alpha } (\lambda ^{k}-\lambda ^{k-1}) \end{aligned}$$
(57)

and

$$\begin{aligned} \partial _{\hat{y}}\mathcal {\hat{L}}_\beta (\hat{w}^{k})=-2 \varrho (y^{k}-y^{k-1}). \end{aligned}$$
(58)

In view of \(\nabla g\) and \(\nabla \psi \) are Lipschitz continuous, then combining with relations (55)-(58), there exists \(\zeta _{1}>0\) such that

$$\begin{aligned} d(0,\partial \mathcal {\hat{L}}_\beta (\hat{w}^{k}))\leqslant \zeta _{1}(\Vert x^{k}-x^{k-1}\Vert +\Vert y^{k}-y^{k-1}\Vert +\Vert \lambda ^{k}-\lambda ^{k-1}\Vert ). \end{aligned}$$

On the other hand, from (52), there exists \(\zeta _{2}>0\) such that

$$\begin{aligned} \Vert \lambda ^{k}-\lambda ^{k-1}\Vert \leqslant \zeta _{2} (\Vert x^{k} - x^{k-1}\Vert + \Vert y^{k}-y^{k-1}\Vert + \Vert y^{k-1}-y^{k-2}\Vert ). \end{aligned}$$

Let \(\zeta = \zeta _{1} (1+ \zeta _{2})\), and from the two inequalities above, we obtain

$$\begin{aligned} d(0,\partial \mathcal {\hat{L}}_\beta (\hat{w}^{k}))\leqslant \zeta (\Vert x^{k}-x^{k-1}\Vert +\Vert y^{k}-y^{k-1}\Vert +\Vert y^{k-1}-y^{k-2}\Vert ). \end{aligned}$$
(59)

Finally, we start to study the convergence of the entire sequence \(\{{w}^{k}\}\). Denote \(\varTheta _{p,q}=\varphi (\hat{\mathcal {L}}_\beta (\hat{w}^{p})-\hat{\mathcal {L}}_\beta (\hat{w}^{*}))-\varphi (\hat{\mathcal {L}}_\beta (\hat{w}^{q})-\hat{\mathcal {L}}_\beta (\hat{w}^{*})).\) Taking into account \(\hat{\mathcal {L}}_\beta (\hat{w}^{k})-\hat{\mathcal {L}}_\beta (\hat{w}^{k+1})=(\hat{\mathcal {L}}_\beta (\hat{w}^{k})-\hat{\mathcal {L}}_\beta (\hat{w}^{*}))-(\hat{\mathcal {L}}_\beta (\hat{w}^{k+1})-\hat{\mathcal {L}}_\beta (\hat{w}^{*}))\) and the concavity of \(\varphi \), we get

$$\begin{aligned} \varTheta _{k,k+1} \geqslant \varphi '(\hat{\mathcal {L}}_\beta (\hat{w}^{k})-\hat{\mathcal {L}}_\beta (\hat{w}^{*}))(\mathcal {\hat{L}}_\beta (\hat{w}^{k})-\mathcal {\hat{L}}_\beta (\hat{w}^{k+1})). \end{aligned}$$

From this, (54), (59) and \(\varphi '(\hat{\mathcal {L}}_\beta (\hat{w}^{k})-\hat{\mathcal {L}}_\beta (\hat{w}^{*}))>0\), it follows that

$$\begin{aligned} \hat{\mathcal {L}}_\beta (\hat{w}^{k})-\hat{\mathcal {L}}_\beta (\hat{w}^{k+1})\leqslant & {} \frac{1}{\varphi '(\hat{\mathcal {L}}_\beta (\hat{w}^{k})-\hat{\mathcal {L}}_\beta (\hat{w}^{*}))} \cdot \varTheta _{k,k+1}\\\leqslant & {} d(0,\partial \mathcal {\hat{L}}_\beta (\hat{w}^{k})) \cdot \varTheta _{k,k+1} \\\leqslant & {} \zeta (\Vert x^{k}-x^{k-1}\Vert +\Vert y^{k}-y^{k-1}\Vert +\Vert y^{k-1}-y^{k-2}\Vert ) \varTheta _{k,k+1}. \end{aligned}$$

From Lemma 5 (i) and the inequality above, one has

$$\begin{aligned}{} & {} \quad \delta (\Vert x^{k+1}-x^{k}\Vert ^{2}+\Vert y^{k+1}-y^{k}\Vert ^{2})\\{} & {} \leqslant \zeta (\Vert x^{k}-x^{k-1}\Vert +\Vert y^{k}-y^{k-1}\Vert +\Vert y^{k-1}-y^{k-2}\Vert )\varTheta _{k,k+1},\ \forall \ k>\tilde{k}, \end{aligned}$$

and hence

$$\begin{aligned}{} & {} \quad (2\Vert x^{k+1}-x^{k}\Vert ^{2}+2\Vert y^{k+1}-y^{k}\Vert ^{2})^{\frac{1}{2}}\\{} & {} \leqslant (\Vert x^{k}-x^{k-1}\Vert +\Vert y^{k}-y^{k-1}\Vert +\Vert y^{k-1}-y^{k-2}\Vert )^{\frac{1}{2}}\sqrt{\frac{2\zeta }{\delta }\varTheta _{k,k+1}}. \end{aligned}$$

Using inequality \(a+b\leqslant \sqrt{2(a^{2}+b^{2})}\), the inequality above further gives that

$$\begin{aligned}{} & {} \quad \Vert x^{k+1}-x^{k}\Vert +\Vert y^{k+1}-y^{k}\Vert \\{} & {} \leqslant (\Vert x^{k}-x^{k-1}\Vert +\Vert y^{k}-y^{k-1}\Vert +\Vert y^{k-1}-y^{k-2}\Vert )^{\frac{1}{2}}\sqrt{\frac{2\zeta }{\delta }\varTheta _{k,k+1}}, \end{aligned}$$

which implies

$$\begin{aligned}{} & {} \quad 4(\Vert x^{i+1}-x^{i}\Vert +\Vert y^{i+1}-y^{i}\Vert )\\{} & {} \leqslant 2(\Vert x^{i}-x^{i-1}\Vert +\Vert y^{i}-y^{i-1}\Vert +\Vert y^{i-1}-y^{i-2}\Vert )^{\frac{1}{2}}\cdot \sqrt{\frac{8\zeta }{\delta }\varTheta _{i,i+1}}\\{} & {} \leqslant \Vert x^{i}-x^{i-1}\Vert +\Vert y^{i}-y^{i-1}\Vert +\Vert y^{i-1}-y^{i-2}\Vert +\frac{8\zeta }{\delta }\varTheta _{i,i+1},\ \forall \ i>\tilde{k}, \end{aligned}$$

where the last inequality from the relation \(2\sqrt{ab}\leqslant a+b\) \((a, b \geqslant 0)\). Summing the inequalities above from \(i=k(\geqslant \tilde{k}+1)\) to \(i= N\), we have

$$\begin{aligned}{} & {} \quad 3\sum \limits _{i=k}^{N}\Vert x^{i+1}-x^{i}\Vert +2\sum \limits _{i=k}^{N}\Vert y^{i+1}-y^{i}\Vert \\{} & {} \leqslant 2\Vert {y}^{k}-{y}^{k-1}\Vert -2\Vert y^{N+1}-y^{N}\Vert +\Vert {x}^{k}-{x}^{k-1}\Vert -\Vert x^{N+1}-x^{N}\Vert \\{} & {} +\Vert {y}^{k-1}-{y}^{k-2}\Vert -\Vert y^{N}-y^{N-1}\Vert +\frac{8\zeta }{\delta }\varTheta _{k,N+1}. \end{aligned}$$

This, along with \(\varphi (\hat{\mathcal {L}}_\beta (\hat{w}^{N+1})-\hat{\mathcal {L}}_\beta (\hat{w}^{*}))>0\), further implies

$$\begin{aligned}{} & {} \quad 3\sum \limits _{i=k}^{N}\Vert x^{i+1}-x^{i}\Vert +2\sum \limits _{i=k}^{N}\Vert y^{i+1}-y^{i}\Vert \\{} & {} \leqslant 2\Vert {y}^{k}-{y}^{k-1}\Vert +\Vert {x}^{k}-{x}^{k-1}\Vert +\Vert {y}^{k-1}-{y}^{k-2}\Vert +\frac{8\zeta }{\delta } (\varphi (\hat{\mathcal {L}}_\beta (\hat{w}^{k})-\hat{\mathcal {L}}_\beta (\hat{w}^{*}))). \end{aligned}$$

Thus,

$$\begin{aligned}{} & {} \quad 3\sum \limits _{i=k}^{+\infty }\Vert x^{i+1}-x^{i}\Vert +2\sum \limits _{i=k}^{+\infty }\Vert y^{i+1}-y^{i}\Vert \nonumber \\{} & {} \leqslant 2\Vert {y}^{k}-{y}^{k-1}\Vert +\Vert {x}^{k}-{x}^{k-1}\Vert +\Vert {y}^{k-1}-{y}^{k-2}\Vert +\frac{8\zeta }{\delta } (\varphi (\hat{\mathcal {L}}_\beta (\hat{w}^{k})-\hat{\mathcal {L}}_\beta (\hat{w}^{*}))).\nonumber \\ \end{aligned}$$
(60)

Consider \(k=\tilde{k}+1\) in the relation above, one has

$$\begin{aligned}{} & {} \quad 3\sum \limits _{i=\tilde{k}+1}^{+\infty }\Vert x^{i+1}-x^{i}\Vert +2\sum \limits _{i=\tilde{k}+1}^{+\infty }\Vert y^{i+1}-y^{i}\Vert \\{} & {} \leqslant 2\Vert {y}^{\tilde{k}+1}-{y}^{\tilde{k}}\Vert +\Vert {x}^{\tilde{k}+1}-{x}^{\tilde{k}}\Vert +\Vert {y}^{\tilde{k}}-{y}^{\tilde{k}-1}\Vert +\frac{8\zeta }{\delta } (\varphi (\hat{\mathcal {L}}_\beta (\hat{w}^{\tilde{k}+1})-\hat{\mathcal {L}}_\beta (\hat{w}^{*}))). \end{aligned}$$

This immediately shows that \(\sum \limits _{k=0}^{+\infty }\Vert x^{k+1}-x^{k}\Vert <+\infty \) and \(\sum \limits _{k=0}^{+\infty }\Vert y^{k+1}-y^{k}\Vert <+\infty .\) Further, from this and relation (52), we obtain \(\sum \limits _{k=0}^{+\infty }\Vert \lambda ^{k+1}-\lambda ^{k}\Vert <+\infty .\) The remaining proof is similar to that of Theorem 2 and so omitted here.

Proof

From (59) for any \(k\geqslant 2\), we have

$$\begin{aligned} \frac{1}{3 \zeta ^2}\Vert \varpi ^{k}\Vert ^2 \leqslant \Vert x^{k}-x^{k-1}\Vert ^2+\Vert y^{k}-y^{k-1}\Vert ^2+\Vert y^{k-1}-y^{k-2}\Vert ^2, \end{aligned}$$
(61)

where \(\varpi ^{k}\in \partial \mathcal {\hat{L}}_\beta (\hat{w}^{k}).\) By (41), there exists a \(k_{0} \geqslant 2\) such that

$$\begin{aligned} \Vert x^{k}-x^{k-1}\Vert ^2+\Vert y^{k}-y^{k-1}\Vert ^2 +\Vert y^{k-1}-y^{k-2}\Vert ^2 \leqslant \frac{1}{\delta } (e_{k-2}-e_{k}), \ \forall \ k \geqslant k_{0}.\nonumber \\ \end{aligned}$$
(62)

Combining (61) and (62) leads to

$$\begin{aligned} \frac{1}{3 \zeta ^2}\Vert \varpi ^{k}\Vert ^2 \leqslant \frac{1}{\delta } (e_{k-2}-e_{k}). \end{aligned}$$
(63)

Let \(\varepsilon >0\) arbitrary small. Since \(\hat{w}^{k}\) converges to \(\hat{w}^{*}\), there exists a \(k_1 \geqslant 0\) such that \(d(\hat{w}^k, \hat{w}^{*})< \varepsilon \) for any \(k>k_{1}\). By the fact that \(\mathcal {\hat{L}}_\beta (\hat{w})\) satisfies the KŁ property at \(\hat{w}^{*}\), \(\mathcal {\hat{L}}_\beta (\hat{w}^{k})\) monotonically decreasing and \(\mathcal {\hat{L}}_\beta (\hat{w}^{k}) \rightarrow \mathcal {\hat{L}}_\beta (\hat{w}^{*})\) as \(k \rightarrow + \infty \), then there exist \(k_2 \geqslant 0\), KŁ-exponent \(\theta \in [0, 1)\) and \(c_{l}>0\) such that \((\mathcal {\hat{L}}_\beta (\hat{w}^{k}) - \mathcal {\hat{L}}_\beta (\hat{w}^{*}))^{\theta } \leqslant c_{l} \cdot d(0,\partial \hat{\mathcal {L}}_\beta (\hat{w}^{k}))\) for all \(k \geqslant k_2\). This follows that

$$\begin{aligned} e_{k}^{2 \theta } \leqslant c_{l}^{2} \Vert \varpi ^{k}\Vert ^2 \ \ \textrm{with} \ \ \varpi ^{k}\in \partial \mathcal {\hat{L}}_\beta (\hat{w}^{k}), \ \forall \ k \geqslant k_2. \end{aligned}$$

This, together with (63), yields

$$\begin{aligned} \frac{\delta }{3 c_{l}^{2} \zeta ^2} e_{k}^{2 \theta } \leqslant e_{k-2}-e_{k}. \end{aligned}$$

Denote \(q = \frac{\delta }{3 c_{l}^{2} \zeta ^2}\), we obtain (42) where \(\hat{k} =\max \{k_1, k_2\}\).

  1. (i)

    Let \(\theta = 0\). If \(e_{k} >0\) for \(k \geqslant \hat{k}\), we have \(\alpha \leqslant e_{k-2}-e_{k}\). As \(k \rightarrow +\infty \), the right- hand-side approaches to zero, then \(0 <\alpha \leqslant 0\), which leads to a contradiction. Since \(e_{k}\) must be equal to zero for \(k \geqslant \hat{k}\). Hence, there exists \(\tilde{k} \geqslant \hat{k}\) such that \(e_{k}=0\) for all \(k \geqslant \tilde{k}\).

  2. (ii)

    If \(\theta \in (0, \frac{1}{2}]\), then \(2 \theta -1 < 0\). Let \(k \geqslant \hat{k}\) be fixed. In view of \(\{e_{i}\}_{i\geqslant \hat{k}}\) is monotonically decreasing, and so \(e_{i} \leqslant e_{\hat{k}}\) for \(i= \hat{k}, \hat{k}+1, \cdots , k\), and from (42), which implies that

    $$\begin{aligned} q e_{\hat{k}}^{2 \theta -1}e_{k} \leqslant q e_{k}^{2 \theta -1}e_{k} \leqslant e_{k-2}-e_{k}, \ \mathrm{i.e.}, \ e_{k} \leqslant \frac{1}{1+ q e_{\hat{k}}^{2 \theta -1}} e_{k-2}. \end{aligned}$$

    We rearrange this to obtain two cases:

    1. (ii-A)

      If \(k-\hat{k}\) is odd, then

      $$\begin{aligned} e_{k} \leqslant \frac{e_{k-2}}{1+ q e_{\hat{k}}^{2 \theta -1}} \leqslant \frac{e_{k-4}}{(1+ q e_{\hat{k}}^{2 \theta -1})^2} \leqslant \cdots \leqslant \frac{e_{\hat{k}-1}}{(1+ q e_{\hat{k}}^{2 \theta -1})^{\frac{k-\hat{k}+1}{2}}}. \end{aligned}$$
    2. (ii-B)

      If \(k-\hat{k}\) is even, then

      $$\begin{aligned} e_{k} \leqslant \frac{e_{k-2}}{1+ q e_{\hat{k}}^{2 \theta -1}} \leqslant \frac{e_{k-4}}{(1+ q e_{\hat{k}}^{2 \theta -1})^2} \leqslant \cdots \leqslant \frac{e_{\hat{k}}}{(1+ q e_{\hat{k}}^{2 \theta -1})^{\frac{k-\hat{k}}{2}}}. \end{aligned}$$

      Let \(\tau :=\left( \frac{1}{1+ \alpha e_{\hat{k}}^{2 \theta -1}}\right) ^{1/2} \in (0, 1)\). Hence

      $$\begin{aligned} e_{k} \leqslant \frac{\max \{e_{j}: 0 \leqslant j \leqslant \hat{k}\}}{\tau ^{\hat{k}-k}} = \frac{\max \{e_{j}: 0 \leqslant j \leqslant \hat{k}\}}{\tau ^{\hat{k}}} \tau ^{k} = O(\tau ^{k}), \ \forall \ k \geqslant \hat{k}. \end{aligned}$$
  3. (iii)

    Let \(\theta \in (\frac{1}{2}, 1)\), and then \(1- 2\theta < 0\). Rearrange (42) to obtain

    $$\begin{aligned} q \leqslant (e_{k-2}-e_{k}) e_{k}^{-2\theta }, \ \forall \ k \geqslant \hat{k}. \end{aligned}$$
    (64)

    Now, define \(h(s) = s^{-2 \theta }\) for \(s \in [0, +\infty )\). Clearly, h is monotonically decreasing as \(h^{\prime } (s) = -2 \theta s^{-(1+ 2 \theta )} <0\). This further gives \(h(e_{k-2}) \leqslant h(e_{k})\) for all \(k \geqslant 2\) as \(e_{k}\) is monotonically decreasing, and so \(h(e_{k-2}) \leqslant h(s), \ s \in [e_{k}, e_{k-2}]\). We consider two cases as follows:

    1. (iii-A)

      If \(h(e_{k}) \leqslant 2\,h(e_{k-2})\) for all \(k \geqslant \hat{k},\) then, together with (64), we have

      $$\begin{aligned} q \leqslant 2 (e_{k-2}-e_{k}) h(e_{k-2})= & {} 2 h(e_{k-2}) \int ^{e_{k-2}}_{e_{k}} 1 \textrm{d} s \leqslant 2 \int ^{e_{k-2}}_{e_{k}} h(s) \textrm{d} s \\= & {} 2 \int ^{e_{k-2}}_{e_{k}} s^{-2 \theta } \textrm{d} s \\= & {} \frac{2}{1-2 \theta } (e_{k-2}^{1-2\theta } - e_{k}^{1-2\theta }), \end{aligned}$$

      where \(1-2\theta < 0\). Rearrange to get

      $$\begin{aligned} 0 < \frac{q (2\theta -1)}{2} \leqslant e_{k}^{1-2\theta } - e_{k-2}^{1-2\theta }. \end{aligned}$$

      Denote \(\hat{\mu } = \frac{q (2\theta -1)}{2} >0\) and \(\nu = 1- 2 \theta < 0\), one has

      $$\begin{aligned} 0< \hat{\mu } \leqslant e_{k}^{\nu } - e_{k-2}^{\nu }, \ \forall \ k \geqslant \hat{k}. \end{aligned}$$
      (65)
    2. (iii-B)

      Considering the case where \(h(e_{k}) \geqslant 2 h(e_{k-2})\), \(e_{k}^{-2\theta } \geqslant 2 e_{k-2}^{-2\theta }\). Rearranging this gives \(\frac{1}{2} e_{k-2}^{2\theta } \geqslant e_{k}^{2\theta }\), which by raising both sides to the power \(\frac{1}{2\theta }\) and setting \(\hat{q}= (1/2)^{\frac{1}{2\theta }}\) leads to \(\hat{q} e_{k-2} \geqslant e_{k}\). Since \(\nu = 1-2 \theta < 0\), we have \(\hat{q}^{\nu } e^{\nu }_{k-2} \leqslant e^{\nu }_{k}\), and then \((\hat{q}^{\nu } -1) e^{\nu }_{k-2} \leqslant e^{\nu }_{k} - e^{\nu }_{k-2}\). In view of \(\hat{q}^{\nu } -1>0\) and \(e_{p} \searrow 0\) as \(p \rightarrow + \infty \), there exists a \(\bar{\mu } >0\) such that \((\hat{q}^{\nu } -1) e^{\nu }_{k-2} > \bar{\mu }\) for all \(k \geqslant \hat{k}\). Therefore, we obtain

      $$\begin{aligned} 0< \bar{\mu } \leqslant e_{k}^{\nu } - e_{k-2}^{\nu }, \ \forall \ k \geqslant \hat{k}. \end{aligned}$$
      (66)

      By selecting \(\mu =\min \{\hat{\mu }, \ \bar{\mu }\} >0\) and combining (65) and (66), we conclude that

      $$\begin{aligned} 0< \mu \leqslant e_{k}^{\nu } - e_{k-2}^{\nu }, \ \forall \ k \geqslant \hat{k}. \end{aligned}$$

      Summing inequality above from \(\hat{k}\) to some \(k (\geqslant \hat{k})\), we have

      $$\begin{aligned} \sum _{i=\hat{k}}^{k} (e_{i}^{\nu } - e_{i-2}^{\nu }) = (e_{k}^{\nu } + e_{k-1}^{\nu }) - (e_{\hat{k}-1}^{\nu } + e_{\hat{k}-2}^{\nu }) \geqslant \mu (k-\hat{k}+1). \end{aligned}$$

      This, along with \(e_{k-1} \geqslant e_{k}\) (for all k) and \(\nu <0\), implies that

      $$\begin{aligned} \frac{\mu }{2} (k-\hat{k}+1) \leqslant e_{k}^{\nu }- e_{\hat{k}-2}^{\nu } \leqslant e_{k}^{\nu }, \ \forall \ k \geqslant \hat{k}. \end{aligned}$$

      Hence,

      $$\begin{aligned} e_{k}^{\nu } \leqslant \left[ \frac{\mu }{2} (k-\hat{k}+1)\right] ^{1/\nu } = \left[ \frac{\mu }{2} (k-\hat{k}+1)\right] ^{1/(1- 2 \theta )}=O(k^{1/(1- 2 \theta )}). \end{aligned}$$

      So, the claim (iii) is proved, and the proof is complete.

Proof

It follows from Lemma 5 (i) that \(\{e_{k}\}_{k \geqslant 0}\) is monotonically decreasing. Then, by utilizing (62) and \(a+b+c \leqslant \sqrt{3(a^2 +b^2 +c^2)}\), we can deduce that for all \(k \geqslant 2\),

$$\begin{aligned} \Vert x^{k}-x^{k-1}\Vert +\Vert y^{k}-y^{k-1}\Vert +\Vert y^{k-1}-y^{k-2}\Vert \leqslant \sqrt{\frac{3}{\delta } (e_{k-2}-e_{k})} \leqslant \sqrt{\frac{3}{\delta } e_{k-2}}.\nonumber \\ \end{aligned}$$
(67)

On the other hand, we have

$$\begin{aligned} \sum \limits _{i=k}^{N} \Vert x^{i+1}-x^{i}\Vert \geqslant \sum \limits _{i=k}^{N}(\Vert x^{i}-x^{*}\Vert - \Vert x^{i+1}-x^{*}\Vert ) = \Vert x^{k}-x^{*}\Vert - \Vert x^{N+1}-x^{*}\Vert . \end{aligned}$$

Letting \(N \rightarrow + \infty \) in the relation above, it implies that

$$\begin{aligned} \sum \limits _{i=k}^{+ \infty } \Vert x^{i+1}-x^{i}\Vert \geqslant \Vert x^{k}-x^{*}\Vert - \lim \limits _{N \rightarrow + \infty } \Vert x^{N+1}-x^{*}\Vert = \Vert x^{k}-x^{*}\Vert . \end{aligned}$$
(68)

Similarly,

$$\begin{aligned} \sum \limits _{i=k}^{+ \infty } \Vert y^{i+1}-y^{i}\Vert \geqslant \Vert y^{k}-y^{*}\Vert . \end{aligned}$$
(69)

To proceed, it is immediately shown from (60), (68) and (69) that

$$\begin{aligned} \Vert x^{k}-x^{*}\Vert +\Vert y^{k}-y^{*}\Vert\leqslant & {} (\Vert {y}^{k}-{y}^{k-1}\Vert +\frac{1}{2}\Vert {x}^{k}-{x}^{k-1}\Vert + \frac{1}{2}\Vert {y}^{k-1}-{y}^{k-2}\Vert )\\{} & {} +\frac{4\zeta }{\delta } \varphi (e_{k}). \end{aligned}$$

Exploiting (67), the inequality above leads to

$$\begin{aligned} \Vert x^{k}-x^{*}\Vert +\Vert y^{k}-y^{*}\Vert\leqslant & {} \frac{4\zeta }{\delta } \varphi (e_{k}) + \sqrt{\frac{3}{\delta } e_{k-2}} \nonumber \\= & {} O(\max \{\varphi (e_{k}), \ \sqrt{e_{k-2}}\}). \end{aligned}$$
(70)

By taking into account (6), (51) and the \(l_g-\)Lipschitz continuity of \(\nabla g\), we obtain

$$\begin{aligned} \Vert \lambda ^{k}-\lambda ^{*}\Vert{} & {} \leqslant \gamma |\alpha - \beta | \Vert A(x^{k} - x^{*})\Vert + (\tau _k + \gamma |\alpha - \beta |) \Vert y^{k}-y^{*}\Vert \\{} & {} \quad + (\tau _k+ l_{g}) \Vert y^{k-1}-y^{*}\Vert , \end{aligned}$$

which implies that

$$\begin{aligned} \Vert \lambda ^{k}-\lambda ^{*}\Vert= & {} O(\Vert x^{k}-x^{*}\Vert ) + O(\Vert y^{k}-y^{*}\Vert ) + O(\Vert y^{k-1}-y^{*}\Vert ) \\= & {} O(\max \{\varphi (e_{k}), \ \sqrt{e_{k-2}}\}). \end{aligned}$$

Together with (70), one has

$$\begin{aligned} \Vert w^{k}-w^{*}\Vert = O(\max \{\varphi (e_{k}), \ \sqrt{e_{k-2}}\}), \ \forall \ k \geqslant \hat{k}, \end{aligned}$$

and relation (43) holds true. The remaining proof is similar to that of Theorem 4 and so omitted here.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, PJ., Jian, JB., Shao, H. et al. A Bregman-Style Improved ADMM and its Linearized Version in the Nonconvex Setting: Convergence and Rate Analyses. J. Oper. Res. Soc. China (2024). https://doi.org/10.1007/s40305-023-00535-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s40305-023-00535-8

Keywords

Mathematics Subject Classification

Navigation