Skip to main content
Log in

A New Insight on Augmented Lagrangian Method with Applications in Machine Learning

  • Published:
Journal of Scientific Computing Aims and scope Submit manuscript

Abstract

By exploiting double-penalty terms for the primal subproblem, we develop a novel relaxed augmented Lagrangian method for solving a family of convex optimization problems subject to equality or inequality constraints. The method is then extended to solve a general multi-block separable convex optimization problem, and two related primal-dual hybrid gradient algorithms are also discussed. Convergence results about the sublinear and linear convergence rates are established by variational characterizations for both the saddle-point of the problem and the first-order optimality conditions of involved subproblems. A large number of experiments on testing the linear support vector machine problem and the robust principal component analysis problem arising from machine learning indicate that our proposed algorithms perform much better than several state-of-the-art algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Data Availability

The link https://github.com/pzheng4218/new-ALM-in-ML provides the matlab codes for experiments.

Notes

  1. The database can be downloaded at http://vision.ucsd.edu/~iskwak/ExtYaleDatabase/ExtYaleB.html.

References

  1. Argyriou, A., Evgeniou, T., Pontil, M.: Multi-task feature learning. Adv. Neural Inform. Process. Syst. 19, 41–48 (2007)

    Google Scholar 

  2. Banert, S., Upadhyaya, M., Giselsson, P.: The Chambolle–Pock method converges weakly with \(\theta >1/2\) and \(\tau \sigma \Vert L\Vert ^2<4/(1+2\theta )\), (2023), arXiv:2309.03998v1

  3. Bai, J., Li, J., Xu, F., Zhang, H.: Generalized symmetric ADMM for separable convex optimization. Comput. Optim. Appl. 70, 129–170 (2018)

    Article  MathSciNet  Google Scholar 

  4. Bai, J., Guo, K., Chang, X.: A family of multi-parameterized proximal point algorithms. IEEE Access 7, 164021–164028 (2019)

    Article  Google Scholar 

  5. Bai, J., Li, J., Wu, Z.: Several variants of the primal-dual hybrid gradient algorithm with applications. Numer. Math. Theor. Meth. Appl. 13, 176–199 (2020)

    Article  MathSciNet  Google Scholar 

  6. Bai, J., Hager, W., Zhang, H.: An inexact accelerated stochastic ADMM for separable convex optimization. Comput. Optim. Appl. 81, 479–518 (2022)

    Article  MathSciNet  Google Scholar 

  7. Bai, J., Ma, Y., Sun, H., Zhang, M.: Iteration complexity analysis of a partial LQP-based alternating direction method of multipliers. Appl. Numer. Math. 165, 500–518 (2021)

    Article  MathSciNet  Google Scholar 

  8. Bai, J., Chang, X., Li, J., Xu, F.: Convergence revisit on generalized symmetric ADMM. Optimization 70, 149–168 (2021)

    Article  MathSciNet  Google Scholar 

  9. Brunton, S., Nathan Kutz, J.: Machine Learning, Dynamical Systems, and Control. Cambridge University Press, Cambridge (2019)

    Google Scholar 

  10. Cui, J., Yan, X., Pu, X., et al.: Aero-engine fault diagnosis based on dynamic PCA and improved SVM. J. Vib. Meas. Dian. 35, 94–99 (2015)

    Google Scholar 

  11. Chambolle, A., Pock, T.: A first-order primal-dual algorithms for convex problem with applications to imaging. J. Math. Imaging Vis. 40, 120–145 (2011)

    Article  MathSciNet  Google Scholar 

  12. Candes, E., Li, X., Ma, Y., Wright, J.: Robust principal component analysis? J. ACM 58, 1–37 (2011)

    Article  MathSciNet  Google Scholar 

  13. Facchinei, F., Pang, J.: Finite-Dimensional Variational Inequalities and Complementarity Problems. Springer-Verlag, Berlin (2003)

    Google Scholar 

  14. Gu, G., He, B., Yuan, X.: Customized proximal point algorithms for linearly constrained convex minimization and saddle-point problems: a unified approach. Comput. Optim. Appl. 59, 135–161 (2014)

    Article  MathSciNet  Google Scholar 

  15. Hao, Y., Sun, J., Yang, G., Bai, J.: The application of support vector machines to gas turbine performance diagnosis. Chin. J. Aeronaut. 18, 15–19 (2005)

    Article  Google Scholar 

  16. Hestenes, M.: Multiplier and gradient methods. J. Optim. Theory Appl. 4, 303–320 (1969)

    Article  MathSciNet  Google Scholar 

  17. He, B., Yuan, X.: On the \(O(1/n)\) convergence rate of the Douglas–Rachford alternating direction method. SIAM J. Numer. Anal. 50, 700–709 (2012)

    Article  MathSciNet  Google Scholar 

  18. He, B., Yuan, X., Zhang, W.: A customized proximal point algorithm for convex minimization with linear constraints. Comput. Optim. Appl. 56, 559–572 (2013)

    Article  MathSciNet  Google Scholar 

  19. He, B., Yuan, X.: A class of ADMM-based algorithms for three-block separable convex programming. Comput. Optim. Appl. 70, 791–826 (2018)

    Article  MathSciNet  Google Scholar 

  20. He, B., You, Y., Yuan, X.: On the convergence of primal-dual hybrid gradient algorithms. SIAM J. Imaging Sci. 7, 2526–2537 (2014)

    Article  MathSciNet  Google Scholar 

  21. He, B.: On the convergence properties of alternating direction method of multipliers. Numer. Math. J. Chin. Univ. (Chinese Series) 39, 81–96 (2017)

    MathSciNet  Google Scholar 

  22. He, B., Yuan, X.: Balanced augmented Lagrangian method for convex programming (2021) arXiv:2108.08554v1

  23. He, B., Xu, S., Yuan, J.: Indefinite linearized augmented Lagrangian method for convex programming with linear inequality constraints, (2021) arXiv:2105.02425v1

  24. He, H., Desai, J., Wang, K.: A primal-dual prediction-correction algorithm for saddle point optimization. J. Glob. Optim. 66, 573–583 (2016)

    Article  MathSciNet  Google Scholar 

  25. He, B., Ma, F., Xu, S., Yuan, X.: A generalized primal-dual algorithm with improved convergence condition for saddle point problems. SIAM J. Imaging Sci. 15, 1157–1183 (2022)

    Article  MathSciNet  Google Scholar 

  26. Jiang, F., Zhang, Z., He, H.: Solving saddle point problems: a landscape of primal-dual algorithm with larger stepsizes. J. Global Optim. 85, 821–846 (2023)

    Article  MathSciNet  Google Scholar 

  27. Jiang, F., Wu, Z., Cai, X., Zhang, H.: A first-order inexact primal-dual algorithm for a class of convex-concave saddle point problems. Numer. Algor. 88, 1109–1136 (2021)

    Article  MathSciNet  Google Scholar 

  28. Li, L.: Selected Applications of Convex Optimization, pp. 17–18. Tsinghua University Press, Beijing (2015)

    Google Scholar 

  29. Li, Q., Xu, Y., Zhang, N.: Two-step fixed-point proximity algorithms for multi-block separable convex problems. J. Sci. Comput. 70, 1204–1228 (2017)

    Article  MathSciNet  Google Scholar 

  30. Liu, Z., Li, J., Li, G., et al.: A new model for sparse and low rank matrix decomposition. J. Appl. Anal. Comput. 7, 600–616 (2017)

    MathSciNet  Google Scholar 

  31. Ma, F., Ni, M.: A class of customized proximal point algorithms for linearly constrained convex optimization. Comput. Appl. Math. 37, 896–911 (2018)

    Article  MathSciNet  Google Scholar 

  32. Osher, S., Heaton, H., Fung, S.: A HamiltonšCJacobi-based proximal operator. PANS 120, e2220469120 (2023)

    Article  Google Scholar 

  33. Powell, M.: A method for nonlinear constraints in minimization problems. In: Fletcher, R. (ed.) Optimization, pp. 283–298. Academic Press, New York (1969)

    Google Scholar 

  34. Papadimitriou, C., Raghavan, P., Tamaki, H., Vempala, S.: Latent semantic indexing, a probabilistic analysis. J. Comput. Syst. Sci. 61, 217–235 (2000)

    Article  MathSciNet  Google Scholar 

  35. Robinson, S.: Some continuity properties of polyhedral multifunctions. Math. Program. Stud. 14, 206–241 (1981)

    Article  MathSciNet  Google Scholar 

  36. Shen, Y., Zuo, Y., Yu, A.: A partially proximal S-ADMM for separable convex optimization with linear constraints. Appl. Numer. Math. 160, 65–83 (2021)

    Article  MathSciNet  Google Scholar 

  37. Tao, M., Yuan, X.: Recovering low-rank and sparse components of matrices from incomplete and noisy observations. SIAM J. Optim. 21, 57–81 (2011)

    Article  MathSciNet  Google Scholar 

  38. Xu, S.: A dual-primal balanced augmented Lagrangian method for linearly constrained convex programming. J. Appl. Math. Comput. 69, 1015–1035 (2023)

    Article  MathSciNet  Google Scholar 

  39. Zhu, Y., Wu, J., Yu, G.: A fast proximal point algorithm for \(l_1\)-minimization problem in compressed sensing. Appl. Math. Comput. 270, 777–784 (2015)

    MathSciNet  Google Scholar 

  40. Zhang, X.: Bregman Divergence and Mirror Descent, Lecture Notes, (2013) http://users.cecs.anu.edu.au/~xzhang/teaching/bregman.pdf

  41. Zhu, M., Chan, T.F.: An Efficient Primal-dual Hybrid Gradient Algorithm for Total Variation Image Restoration, CAM Report 08–34. UCLA, Los Angeles (2008)

    Google Scholar 

Download references

Acknowledgements

The authors would like to thank the editor and anonymous referees for their valuable comments and suggestions, which have significantly improved the quality of the paper.

Funding

This work was supported by Guangdong Basic and Applied Basic Research Foundation (2023A1515012405), Shaanxi Fundamental Science Research Project for Mathematics and Physics (22JSQ001), National Key Laboratory of Aircraft Configuration Design (D5150240011) and National Natural Science Foundation of China (Grants 12071398 and 52372397).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zheng Peng.

Ethics declarations

Conflict of interest

The authors declared that, they do not have any commercial or associative interest that represents a Conflict of interest in connection with this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix: Discussions on Two New PDHG

Appendix: Discussions on Two New PDHG

In this appendix, we discuss two new types of PDHG algorithm without relaxation step for solving the following convex-concave saddle-point problem

$$\begin{aligned} \min \limits _{\mathbf{{x}}\in \mathcal{{X}}}\max \limits _{\mathbf{{y}}\in \mathcal{{Y}}} \Phi (\mathbf{{x,y}}):= \theta _1(\mathbf{{x}})-\mathbf{{y}}^\mathsf{{T}}A\mathbf{{x}}-\theta _2(\mathbf{{y}}), \end{aligned}$$
(41)

or, equivalently, the composite problem \( \min \limits _{\mathbf{{x}}\in \mathcal{{X}}} \big \{\theta _1(\mathbf{{x}}) + \theta _2^*(-A\mathbf{{x}})\big \}, \) where \(\mathcal{{X}}\subseteq \mathcal{{R}}^n, \mathcal{{Y}}\subseteq \mathcal{{R}}^m\) are closed convex sets, both \(\theta _1: {\mathcal{{X}}}\rightarrow \mathcal{{R}}\) and \( \theta _2: {\mathcal{{Y}}}\rightarrow \mathcal{{R}}\) are convex but possibly nonsmooth functions, \(\theta _2^*\) is the conjugate function of \(\theta _2\), and \(A\in \mathcal{{R}}^{m\times n}\) is a given data. A lot of practical examples can be reformulated as special cases of (41), see e.g. [27, Section 5]. Note that Problem (41) can reduce to the dual of (1) by letting \(\theta _2=-\lambda ^\mathsf{{T}}b,\mathbf{{y}}=\lambda \) and \(\mathcal{{Y}}=\Lambda .\) Hence, the convergence results also hold for the previous P-rALM. Throughout the forthcoming discussions, the solution set of (41) is assumed to be nonempty.

The original PDHG proposed in [41] is to solve some TV image restoration problems. Extending it to the problem (41), we get the following scheme:

$$\begin{aligned}\left\{ \begin{array}{l} \mathbf{{x}}^{k+1}=\arg \min \limits _{\mathbf{{x}}\in \mathcal{{X}}} \Phi (\mathbf{{x}},\mathbf{{y}}^k)+\frac{r}{2}\Vert \mathbf{{x}}-\mathbf{{x}}^k\Vert ^2, \\ \mathbf{{y}}^{k+1}=\arg \max \limits _{\mathbf{{y}}\in \mathcal{{Y}}} \Phi (\mathbf{{x}}^{k+1},\mathbf{{y}})-\frac{s}{2}\Vert \mathbf{{y}}-\mathbf{{y}}^k\Vert ^2, \end{array}\right. \end{aligned}$$

where rs are positive scalars. He, et al. [20] pointed out that convergence of the above PDHG can be ensured if \(\theta _1\) is strongly convex and \(rs>\rho (A^\mathsf{{T}}A)\). To weaken these convergence conditions, (e.g., the function \(\theta _1\) is only convex and the parameters rs do not depend on \(\rho (A^\mathsf{{T}}A)\)), we develop a novel PDHG (N-PDHG1) as follows.

figure d

Another related algorithm, called N-PDHG2, is just to modify the final subproblem of PDHG, whose framework is described in the next box. A similar quadratic term was adopted in [24] to solve a special case of (41). We can observe that N-PDHG1 has certain connections with P-rALM, since the first-order optimality conditions of their involved subproblems are reformulated as similar variational inequalities with the same block matrix H, see (7) and the next (42). Actually, their \(\mathbf{{x}}\)-subproblems enjoy the same proximal term. Another observation is that N-PDHG2 is developed from N-PDHG1 by just modifying the involved proximal parameters, and one of their subproblems could enjoy a proximity operator directly.

figure e

1.1 Sublinear Convergence Under General Convex Assumption

Due to the close similarities between the two algorithms mentioned above, we will only analyze the convergence properties of N-PDHG1 under general convex assumptions in the following section before proceeding over the second algorithm’s convergence. For convenience, we denote \(\mathcal{{U}}:=\mathcal{{X}}\times \mathcal{{Y}}\) and

$$\begin{aligned} \theta (\mathbf{{u}})=\theta _1(\mathbf{{x}})+\theta _2(\mathbf{{y}}),\quad \mathbf{{u}}=\left( \begin{array}{c} \mathbf{{x}} \\ \mathbf{{y}} \end{array}\right) ,\quad \mathbf{{u}}^k=\left( \begin{array}{c} \mathbf{{x}}^k \\ \mathbf{{y}}^k \end{array}\right) \quad \text {and} \quad M=\left[ \begin{array}{ccccccc} \mathbf{{0}} &{}&{} -A^\mathsf{{T}}\\ A &{}&{} \mathbf{{0}} \end{array}\right] . \end{aligned}$$

Lemma 5.1

The sequence \(\{ \mathbf{{u}}^k \}\) generated by N-PDHG1 satisfies

$$\begin{aligned} \mathbf{{u}}^{k+1}\in \mathcal{{U}},~ \theta (\mathbf{{u}})- \theta (\mathbf{{u}}^{k+1}) +\big \langle \mathbf{{u}}-\mathbf{{u}}^{k+1}, M\mathbf{{u}}\big \rangle \ge \big \langle \mathbf{{u}}-\mathbf{{u}}^{k+1}, H(\mathbf{{u}}^k-\mathbf{{u}}^{k+1}) \big \rangle \end{aligned}$$
(42)

for any \(\mathbf{{u}}\in \mathcal{{U}}\), where H is given by (8). Moreover, we have

$$\begin{aligned}{} & {} \theta (\mathbf{{u}})- \theta (\mathbf{{u}}^{k+1}) +\big \langle \mathbf{{u}}-\mathbf{{u}}^{k+1}, M\mathbf{{u}}\big \rangle \nonumber \\{} & {} \ge \frac{1}{2}\Big ( \big \Vert \mathbf{{u}}-\mathbf{{u}}^{k+1}\big \Vert ^2_H-\big \Vert \mathbf{{u}}-\mathbf{{u}}^k\big \Vert ^2_H\Big )+\frac{1}{2} \big \Vert \mathbf{{u}}^k-\mathbf{{u}}^{k+1}\big \Vert ^2_H. \end{aligned}$$
(43)

Proof. According to the first-order optimality condition of the \(\mathbf{{x}}\)-subproblem in N-PDHG1, we have \(\mathbf{{x}}^{k+1}\in \mathcal{{X}}\) and

$$\begin{aligned} \theta _1(\mathbf{{x}})- \theta _1(\mathbf{{x}}^{k+1}) + \left\langle \mathbf{{x}}-\mathbf{{x}}^{k+1}, -A^\mathsf{{T}}\mathbf{{y}}^k+\big (rA^\mathsf{{T}}A+Q\big ) (\mathbf{{x}}^{k+1}-\mathbf{{x}}^k)\right\rangle \ge 0, ~ \forall \mathbf{{x}} \in \mathcal{{X}},\qquad \end{aligned}$$
(44)

that is,

$$\begin{aligned}{} & {} \theta _1(\mathbf{{x}})- \theta _1(\mathbf{{x}}^{k+1}) + \left\langle \mathbf{{x}}-\mathbf{{x}}^{k+1}, -A^\mathsf{{T}}\mathbf{{y}}^{k+1} \right\rangle \nonumber \\{} & {} \quad \ge \Big \langle \mathbf{{x}}-\mathbf{{x}}^{k+1}, \big (rA^\mathsf{{T}}A+Q\big ) (\mathbf{{x}}^k-\mathbf{{x}}^{k+1})+ A^\mathsf{{T}}(\mathbf{{y}}^k-\mathbf{{y}}^{k+1}) \Big \rangle . \end{aligned}$$
(45)

Similarly, we have \(\mathbf{{y}}^{k+1}\in \mathcal{{Y}}\) and

$$\begin{aligned} \theta _2(\mathbf{{y}})- \theta _2(\mathbf{{y}}^{k+1}) + \Big \langle \mathbf{{y}}-\mathbf{{y}}^{k+1}, A (2\mathbf{{x}}^{k+1}-\mathbf{{x}}^k)+\frac{1}{r} (\mathbf{{y}}^{k+1}-\mathbf{{y}}^k)\Big \rangle \ge 0, ~ \forall \mathbf{{y}} \in \mathcal{{Y}}, \end{aligned}$$
(46)

that is,

$$\begin{aligned} \theta _2(\mathbf{{y}})- \theta _2(\mathbf{{y}}^{k+1}) + \Big \langle \mathbf{{y}}-\mathbf{{y}}^{k+1}{,} A^\mathsf{{T}}\mathbf{{x}}^{k+1} \Big \rangle \ge \Big \langle \mathbf{{y}}-\mathbf{{y}}^{k+1}, A (\mathbf{{x}}^k-\mathbf{{x}}^{k+1})+ \frac{1}{r} (\mathbf{{y}}^k-\mathbf{{y}}^{k+1}) \Big \rangle .\nonumber \\ \end{aligned}$$
(47)

Combine the inequalities (45)–(47) and the structure of H given by (8) to have

$$\begin{aligned} \theta (\mathbf{{u}})- \theta (\mathbf{{u}}^{k+1}) +\Big \langle \mathbf{{u}}-\mathbf{{u}}^{k+1}, M\mathbf{{u}}^{k+1}\Big \rangle \ge \Big \langle \mathbf{{u}}-\mathbf{{u}}^{k+1}, H(\mathbf{{u}}^k-\mathbf{{u}}^{k+1}) \Big \rangle , \end{aligned}$$

which together with the the property \(\left\langle \mathbf{{u}}-\mathbf{{u}}^{k+1}, M( \mathbf{{u}}-\mathbf{{u}}^{k+1}) \right\rangle =0\) confirms (42). Then, the inequality (43) is obtained by applying (42) and the identity in (21). \(\blacksquare \)

Now, we discuss the global convergence and sublinear convergence rate of N-PDHG1. Let \(\mathbf{{u}}^*=(\mathbf{{x}}^*;\mathbf{{y}}^*)\in \mathcal{{U}}\) be a solution point of the problem (41). Then, it holds

$$\begin{aligned} {\Phi (\mathbf{{x}}^*,\mathbf{{y}})\le \Phi (\mathbf{{x}}^*,\mathbf{{y}}^*)\le \Phi (\mathbf{{x}},\mathbf{{y}}^*), \quad \forall \mathbf{{x}}\in \mathcal{{X}}, \mathbf{{y}}\in \mathcal{{Y}}}, \end{aligned}$$

namely,

$$\begin{aligned} \left\{ \begin{array}{lllll} \mathbf{{x}}^*\in \mathcal{{X}}, &{}\theta _1(\mathbf{{x}})- \theta _1(\mathbf{{x}}^*) &{}+ &{}\langle \mathbf{{x}}-\mathbf{{x}}^*, -A^\mathsf{{T}}\mathbf{{y}}^*\rangle \ge 0, &{}\forall \mathbf{{x}} \in \mathcal{{X}},\\ \mathbf{{y}}^*\in \mathcal{{Y}}, &{}\theta _2(\mathbf{{y}})- \theta _2(\mathbf{{y}}^*) &{}+ &{}\langle \mathbf{{y}}-\mathbf{{y}}^*, A \mathbf{{x}}^*\rangle \ge 0, &{}\forall \mathbf{{x}} \in \mathcal{{Y}}. \end{array}\right. \end{aligned}$$

So, finding a solution point of (41) amounts to finding \(\mathbf{{u}}^*\in \mathcal{{U}}\) such that

$$\begin{aligned} \mathbf{{u}}^*\in \mathcal{{U}},~~~ \theta (\mathbf{{u}})- \theta (\mathbf{{u}}^*) +\left\langle \mathbf{{u}}-\mathbf{{u}}^*, M\mathbf{{u}}^*\right\rangle \ge 0, \quad \forall \mathbf{{u}}\in \mathcal{{U}}. \end{aligned}$$
(48)

Setting \(\mathbf{{u}}:=\mathbf{{u}}^*\) in (43) together with (48) gives

$$\begin{aligned} \big \Vert \mathbf{{u}}^*-\mathbf{{u}}^{k+1}\big \Vert ^2_H\le \big \Vert \mathbf{{u}}^*-\mathbf{{u}}^k\big \Vert ^2_H - \big \Vert \mathbf{{u}}^k-\mathbf{{u}}^{k+1}\big \Vert ^2_H, \end{aligned}$$
(49)

that is, the sequence generated by N-PDHG1 is contractive, and thus N-PDHG1 converges globally. The last inequality together with the analysis of P-rALM indicates that N-PDHG1 with a relaxation step also converges, and the sublinear convergence rate of N-PDHG1 is similar to the proof of P-rALM. Note that the convergence of N-PDHG1 does not need the strong convexity of \(\theta _1\) and allows more flexibility for choosing the proximal parameter r.

Finally, it is not difficult from the first-order optimality conditions of the involved subproblems in N-PDHG2 that

$$\begin{aligned} \mathbf{{u}}^{k+1}\in \mathcal{{U}},~~ \theta (\mathbf{{u}})- \theta (\mathbf{{u}}^{k+1}) +\Big \langle \mathbf{{u}}-\mathbf{{u}}^{k+1}, M\mathbf{{u}}\Big \rangle \ge \Big \langle \mathbf{{u}}-\mathbf{{u}}^{k+1}, \widetilde{H}(\mathbf{{u}}^k-\mathbf{{u}}^{k+1}) \Big \rangle \end{aligned}$$

for any \(\mathbf{{u}}\in \mathcal{{U}}\), where

$$\begin{aligned} \widetilde{H}=\left[ \begin{array}{ccccccc} r\mathbf{{I}} &{}&{} A^\mathsf{{T}}\\ A &{}&{} \frac{1}{r}AA^\mathsf{{T}}+Q \end{array}\right] \end{aligned}$$

and \(\widetilde{H}\) is positive definite for any \(r>0\) and \(Q\succ \mathbf{{0}}\). So, N-PDHG2 also converges globally with a sublinear convergence rate. This matrix \(\widetilde{H}\) is what we discussed in Sect. 1 and could reduce to that in [22] with \(Q=\delta \mathbf{{I}}\) for any \(\delta >0\).

1.2 Linear Convergence Under Strongly Convexity Assumption

The linear convergence rate of N-PDHG1 will be investigated in this subsection under the following assumptions:

  1. (a1)

    The matrix A has full row rank and \(\mathcal{{X}}=\mathcal{{R}}^n;\)

  2. (a2)

    The function \(\theta _1\) is strongly convex with modulus \(\nu >0\) and \(\nabla \theta _1\) is Lipschitz continuous with constant \(L_{\theta _1}>0\).

From the second part of (a2) and the first-order optimality condition of \(\mathbf{{x}}^{k+1}\)-subproblem in N-PDHG1, we have

$$\begin{aligned} -\nabla \theta _1(\mathbf{{x}}^{k+1})=-A^\mathsf{{T}}\mathbf{{y}}^k+\big (rA^\mathsf{{T}}A+Q\big ) (\mathbf{{x}}^{k+1}-\mathbf{{x}}^k). \end{aligned}$$
(50)

Together with this equation and the first part of (a2), it holds

$$\begin{aligned}{} & {} \theta _1(\mathbf{{x}})-\theta _1(\mathbf{{x}}^{k+1})\ge \big \langle \mathbf{{x}}-\mathbf{{x}}^{k+1}, \nabla \theta _1(\mathbf{{x}}^{k+1})\big \rangle +\frac{\nu }{2}\big \Vert \mathbf{{x}}-\mathbf{{x}}^{k+1} \big \Vert ^2\nonumber \\{} & {} \Rightarrow \theta _1(\mathbf{{x}})-\theta _1(\mathbf{{x}}^{k+1}) +\big \langle \mathbf{{x}}-\mathbf{{x}}^{k+1}, -A^\mathsf{{T}}\mathbf{{y}}^{k+1}\big \rangle \ge \frac{\nu }{2}\big \Vert \mathbf{{x}}-\mathbf{{x}}^{k+1} \big \Vert ^2\nonumber \\{} & {} \qquad +\big \langle \mathbf{{x}}-\mathbf{{x}}^{k+1}, \big (rA^\mathsf{{T}}A+Q\big ) (\mathbf{{x}}^k-\mathbf{{x}}^{k+1})+ A^\mathsf{{T}}(\mathbf{{y}}^k-\mathbf{{y}}^{k+1}) \big \rangle , \end{aligned}$$

which implies that \(\frac{\nu }{2}\left\| \mathbf{{x}}-\mathbf{{x}}^{k+1} \right\| ^2\) will be added to the right-hand-side of (42) and finally

$$\begin{aligned} \big \Vert \mathbf{{u}}^*-\mathbf{{u}}^{k+1}\big \Vert ^2_H\le \big \Vert \mathbf{{u}}^*-\mathbf{{u}}^k\big \Vert ^2_H - \big \Vert \mathbf{{u}}^k-\mathbf{{u}}^{k+1}\big \Vert ^2_H - \nu \big \Vert \mathbf{{x}}^*-\mathbf{{x}}^{k+1} \big \Vert ^2. \end{aligned}$$
(51)

Note that the equation (50) can be equivalently rewritten as

$$\begin{aligned} A^\mathsf{{T}}\mathbf{{y}}^{k+1}= \nabla \theta _1(\mathbf{{x}}^{k+1})+A^\mathsf{{T}}(\mathbf{{y}}^{k+1}-\mathbf{{y}}^k)+\big (rA^\mathsf{{T}}A+Q\big ) (\mathbf{{x}}^{k+1}-\mathbf{{x}}^k). \end{aligned}$$
(52)

Besides, the solution \((\mathbf{{x}}^*;\mathbf{{y}}^*)\) satisfies

$$\begin{aligned} \nabla \theta _1(\mathbf{{x}}^*)= A^\mathsf{{T}}\mathbf{{y}}^*. \end{aligned}$$
(53)

Combining the equations (52)–(53) together with (a1)-(a2) is to obtain

$$\begin{aligned}{} & {} \sigma _A\big \Vert \mathbf{{y}}^{k+1}-\mathbf{{y}}^*\big \Vert ^2 \le \big \Vert A^\mathsf{{T}}(\mathbf{{y}}^{k+1}-\mathbf{{y}}^*)\big \Vert ^2 \\{} & {} \quad = \big \Vert \nabla \theta _1(\mathbf{{x}}^{k+1})- \nabla \theta _1(\mathbf{{x}}^*) + A^\mathsf{{T}}(\mathbf{{y}}^{k+1}-\mathbf{{y}}^k)+\big (rA^\mathsf{{T}}A+Q\big ) (\mathbf{{x}}^{k+1}-\mathbf{{x}}^k)\big \Vert ^2 \\{} & {} \quad \le 3\Big \{\big \Vert \nabla \theta _1(\mathbf{{x}}^{k+1})- \nabla \theta _1(\mathbf{{x}}^*) \big \Vert ^2 +\big \Vert A^\mathsf{{T}}(\mathbf{{y}}^{k+1}-\mathbf{{y}}^k) \big \Vert ^2+\big \Vert \big (rA^\mathsf{{T}}A+Q\big ) (\mathbf{{x}}^{k+1}-\mathbf{{x}}^k)\big \Vert ^2\Big \}\\{} & {} \quad \le 3\Big \{L_{\theta _1}^2\big \Vert \mathbf{{x}}^{k+1} - \mathbf{{x}}^* \big \Vert ^2 +\Vert A \Vert ^2\big \Vert \mathbf{{y}}^{k+1}-\mathbf{{y}}^k \big \Vert ^2+\big \Vert \big (rA^\mathsf{{T}}A+Q\big ) (\mathbf{{x}}^{k+1}-\mathbf{{x}}^k)\big \Vert ^2\Big \}, \end{aligned}$$

where \(\sigma _A>0\) denotes the smallest eigenvalue of \(AA^\mathsf{{T}}\) due to (a1). So, we have

$$\begin{aligned} \big \Vert \mathbf{{u}}^*-\mathbf{{u}}^{k+1}\big \Vert ^2_H\le & {} \Vert H\Vert \Big \{\big \Vert \mathbf{{x}}^*-\mathbf{{x}}^{k+1}\big \Vert ^2+\big \Vert \mathbf{{y}}^*-\mathbf{{y}}^{k+1}\big \Vert ^2\Big \}\nonumber \\\le & {} \Vert H\Vert \Big \{(1+3L_{\theta _1}^2\sigma _A^{-1})\big \Vert \mathbf{{x}}^*-\mathbf{{x}}^{k+1}\big \Vert ^2+3\sigma _A^{-1}\Vert A \Vert ^2\big \Vert \mathbf{{y}}^{k+1}-\mathbf{{y}}^k \big \Vert ^2 \nonumber \\{} & {} \quad ~ + 3\sigma _A^{-1}\big \Vert \big (rA^\mathsf{{T}}A+Q\big ) (\mathbf{{x}}^{k+1}-\mathbf{{x}}^k)\big \Vert ^2\Big \}. \end{aligned}$$
(54)

By the structure of H and the Young’s inequality, it holds that

$$\begin{aligned}{} & {} \big \Vert \mathbf{{u}}^k-\mathbf{{u}}^{k+1}\big \Vert ^2_H\nonumber \\{} & {} \quad = \big \Vert \big (rA^\mathsf{{T}}A+Q\big ) (\mathbf{{x}}^{k+1}-\mathbf{{x}}^k)\big \Vert ^2+ \frac{1}{r} \big \Vert \mathbf{{y}}^{k+1}-\mathbf{{y}}^k \big \Vert ^2+2\big \langle \mathbf{{x}}^{k+1}-\mathbf{{x}}^k, A^\mathsf{{T}}(\mathbf{{y}}^{k+1}-\mathbf{{y}}^k)\big \rangle \nonumber \\{} & {} \quad \ge \big \Vert \big (rA^\mathsf{{T}}A+Q\big ) (\mathbf{{x}}^{k+1}-\mathbf{{x}}^k)\big \Vert ^2+ \frac{1}{r} \big \Vert \mathbf{{y}}^{k+1}-\mathbf{{y}}^k \big \Vert ^2\nonumber \\{} & {} \quad \quad -\Big \{ \delta _0\big \Vert \mathbf{{x}}^{k+1}-\mathbf{{x}}^k \big \Vert ^2+\frac{1}{\delta _0}\Vert A^\mathsf{{T}}A\Vert \big \Vert \mathbf{{y}}^{k+1}-\mathbf{{y}}^k \big \Vert ^2\Big \}, \nonumber \\{} & {} \quad \ge \big \Vert \big (rA^\mathsf{{T}}A+Q\big ) (\mathbf{{x}}^{k+1}-\mathbf{{x}}^k)\big \Vert ^2- \delta _0\big \Vert \mathbf{{x}}^{k+1}-\mathbf{{x}}^k \big \Vert ^2+\Big ( \frac{1}{r}-\frac{ \Vert A^\mathsf{{T}}A\Vert }{\delta _0}\Big )\big \Vert \mathbf{{y}}^{k+1}-\mathbf{{y}}^k \big \Vert ^2,\quad \nonumber \\ \end{aligned}$$
(55)

where \( \delta _0 \in ( r\Vert A^\mathsf{{T}}A\Vert , \Vert rA^\mathsf{{T}}A+Q\Vert ^2)\) exists for proper choices of r and Q. Now, let

$$\begin{aligned} \delta ^k= \min \left\{ \frac{\nu }{(1+3L_{\theta _1}^2\sigma _A^{-1})\Vert H\Vert },~ \frac{\delta _0-r\Vert A^\mathsf{{T}}A\Vert }{3r\delta _0\sigma _A^{-1}\Vert A\Vert ^2\Vert H\Vert },~ \frac{\Vert rA^\mathsf{{T}}A+Q\Vert ^2-\delta _0}{3 \sigma _A^{-1}\Vert H\Vert \left\| \left( rA^\mathsf{{T}}A+Q\right) (\mathbf{{x}}^{k+1}-\mathbf{{x}}^k)\right\| ^2} \right\} . \end{aligned}$$

Then, combining the above inequalities (51) and (54)-(55), we can deduce

$$\begin{aligned}{} & {} (1+\delta ^k)\big \Vert \mathbf{{u}}^*-\mathbf{{u}}^{k+1}\big \Vert ^2_H-\big \Vert \mathbf{{u}}^*-\mathbf{{u}}^k\big \Vert ^2_H \nonumber \\{} & {} \quad \le \delta ^k\big \Vert \mathbf{{u}}^*-\mathbf{{u}}^{k+1}\big \Vert ^2_H- \big \Vert \mathbf{{u}}^k-\mathbf{{u}}^{k+1}\big \Vert ^2_H - \nu \big \Vert \mathbf{{x}}^*-\mathbf{{x}}^{k+1} \big \Vert ^2\nonumber \\{} & {} \quad \le \Big \{\delta ^k(1+3L_{\theta _1}^2 \sigma _A^{-1})\Vert H\Vert -\nu \Big \}\big \Vert \mathbf{{x}}^*-\mathbf{{x}}^{k+1} \big \Vert ^2\nonumber \\{} & {} \quad +\Big \{3\delta ^k\sigma _A^{-1}\Vert A \Vert ^2\Vert H\Vert -\frac{1}{r}+\frac{ \Vert A^\mathsf{{T}}A\Vert }{\delta _0}\Big \}\big \Vert \mathbf{{y}}^{k+1}-\mathbf{{y}}^k \big \Vert ^2\nonumber \\{} & {} \qquad +\big (3\delta ^k\sigma _A^{-1}\Vert H\Vert -1\big )\big \Vert \big (rA^\mathsf{{T}}A+Q\big ) (\mathbf{{x}}^{k+1}-\mathbf{{x}}^k)\big \Vert ^2+\delta _0\big \Vert \mathbf{{x}}^{k+1}-\mathbf{{x}}^k \big \Vert ^2. \end{aligned}$$
(56)

Observing from the definition of \(\delta ^k\), it holds

$$\begin{aligned} \left\{ \begin{array}{lllll} \delta ^k(1+3L_{\theta _1}^2\sigma _A^{-1})\Vert H\Vert -\nu \le 0,\\ 3\delta ^k\sigma _A^{-1}\Vert A \Vert ^2\Vert H\Vert -\frac{1}{r}+\frac{\Vert A^\mathsf{{T}}A\Vert }{\delta _0}\le 0,\\ \left( 3\delta ^k\sigma _A^{-1}\Vert H\Vert -1\right) \left\| \left( rA^\mathsf{{T}}A+Q\right) (\mathbf{{x}}^{k+1}-\mathbf{{x}}^k)\right\| ^2+\delta _0\left\| \mathbf{{x}}^{k+1}-\mathbf{{x}}^k \right\| ^2\le 0, \end{array}\right. \end{aligned}$$

and finally ensures the following Q-linear convergence rate:

$$\begin{aligned} \big \Vert \mathbf{{u}}^*-\mathbf{{u}}^{k+1}\big \Vert ^2_H\le \frac{1}{ 1+\delta ^k }\big \Vert \mathbf{{u}}^*-\mathbf{{u}}^k\big \Vert ^2_H. \end{aligned}$$

The above analysis also indicates that our proposed P-rALM for solving the problem (1) will converge Q-linearly under the similar assumptions that \(\theta \) is strongly convex, its gradient \(\nabla \theta \) is Lipschitz continuous, the matrix A has full row rank and \(\mathcal{{X}}=\mathcal{{R}}^n\).

1.3 Linear Convergence Under The Error Bound Condition

In this section, we use \(\partial f(x)\) to denote the sub-differential of the convex function f at x. f is said to be a piecewise linear multifunction if its graph \(Gr(f):=\{(x,y)\mid y\in f(x)\}\) is a union of finitely many polyhedra. The projection operator \(\mathcal{{P}}_{\mathcal{{C}}}(x)\) is nonexpansive, i.e.,

$$\begin{aligned} \Vert \mathcal{{P}}_{\mathcal{{C}}}(x)-\mathcal{{P}}_{\mathcal{{C}}}(z)\Vert \le \Vert x-z\Vert ,\quad \forall x,z\in \mathcal{{R}}^n. \end{aligned}$$
(57)

Given \(H\succ \mathbf{{0}}\), we define \({\text {dist}}_{H}(x, \mathcal{{C}}):=\min \limits _{z\in \mathcal{{C}}}\Vert x-z\Vert _H\). When \(H=\mathbf{{I}}\), we simply denote it by \({\text {dist}}(x, \mathcal{{C}})\). For any \(\mathbf{{u}}\in \mathcal{{U}}\) and \(\alpha >0\), we define

$$\begin{aligned} e_{\mathcal{{U}}}(\mathbf{{u}},\alpha ):=\left( \begin{array}{c} e_{\mathcal{{X}}}(\mathbf{{u}},\alpha ):= \mathbf{{x}}-\mathcal{{P}}_{\mathcal{{X}}}\left[ \mathbf{{x}} - \alpha (\xi _{\mathbf{{x}}} -A^\mathsf{{T}}\mathbf{{y}}{)}\right] \\ e_{\mathcal{{Y}}}(\mathbf{{u}},\alpha ):= \mathbf{{y}}-\mathcal{{P}}_{\mathcal{{Y}}}\left[ \mathbf{{y}} - \alpha (\xi _{\mathbf{{y}}} +A \mathbf{{x}}{)}\right] \end{array}\right) , \end{aligned}$$
(58)

where \(\xi _{\mathbf{{x}}}\in \partial \theta _1(\mathbf{{x}}),\xi _{\mathbf{{y}}}\in \partial \theta _2(\mathbf{{y}}).\) Note that a point

$$\begin{aligned} \mathbf{{u}}^*\in \mathcal{{U}}^* = \big \{\hat{\mathbf{{u}}}\in \mathcal{{U}}\mid {\text {dist}}\left( \textbf{0},e_{\mathcal{{U}}}(\hat{\mathbf{{u}}},\alpha )\right) =0\big \} \end{aligned}$$

is the solution of (41) if and only if \( e_{\mathcal{{U}}}(\mathbf{{u}}^*,\alpha )=\mathbf{{0}}\). Different from the assumptions (a1)-(a2), we next investigate the linear convergence rate of N-PDHG1 under an error bound condition in terms of the mapping \(e_{\mathcal{{U}}}(\mathbf{{u}},1)\):

  1. (a3)

    Assume that there exists a constant \(\zeta >0\) such that

    $$\begin{aligned} {\text {dist}}\left( \mathbf{{u}},\mathcal{{U}}^*\right) \le \zeta {\text {dist}}\left( \textbf{0},e_{\mathcal{{U}}}(\mathbf{{u}},1)\right) , ~~\forall \mathbf{{u}}\in \mathcal{{U}}. \end{aligned}$$
    (59)

The condition (59) is generally weaker than the strong convexity assumption and hence can be satisfied by some problems that have non-strongly convex objective functions. Note that if the sub-differentials \(\partial \theta _1(\mathbf{{x}})\) and \(\partial \theta _2(\mathbf{{y}})\) are piecewise linear multifunctions and the constraint sets \(\mathcal{{X}}, \mathcal{{Y}}\) are polyhedral, then both \(\mathcal{{P}}_{\mathcal{{X}}}\) and \(\mathcal{{P}}_{\mathcal{{Y}}}\) are piecewise linear multifunctions by [13, Prop. 4.1.4] and hence \(e_{\mathcal{{U}}}(\mathbf{{u}},\alpha )\) is also a piecewise linear multifunction. Followed by Robinson’s continuity property [35] for polyhedral multifunctions, the assumption (a3) holds automatically. For convenience of the sequel analysis, we denote

$$\begin{aligned} \mathcal{{Q}} =\left[ \begin{array}{ccccccc} (rA^\mathsf{{T}}A+Q)^\mathsf{{T}}(rA^\mathsf{{T}}A+Q)+ A^\mathsf{{T}}A &{}&{} \mathbf{{0}} \\ \mathbf{{0}} &{}&{} \frac{1}{r}\mathbf{{I}}+AA^\mathsf{{T}}\end{array}\right] . \end{aligned}$$
(60)

It is easy to check that \(\mathcal{{Q}}\) is symmetric positive definite because \(\Vert \mathbf{{u}}\Vert ^2_{\mathcal{{Q}}}>0\) for any \(\mathbf{{u}}\ne \mathbf{{0}}\). By equivalent expressions for the first-order optimality conditions (44) and (46) together with the structure of \(\mathcal{{Q}}\), we have the following estimation on the distance between \(\mathbf{{0}}\) and \(e_{\mathcal{{U}}}(\mathbf{{u}}^{k+1},1)\), which follows the similar proof as that in [8, Sec. 2.2].

Lemma 5.2

Let \(\mathcal{{Q}}\) be given in (60). Then, the iterates generated by N-PDHG1 satisfy

$$\begin{aligned} {\text {dist}}^2\big (\textbf{0},e_{\mathcal{{U}}}(\mathbf{{u}}^{k+1},1)\big )\le 2 \big \Vert \mathbf{{u}}^k-\mathbf{{u}}^{k+1}\big \Vert _{\mathcal{{Q}}}^2. \end{aligned}$$
(61)

Proof. The first-order optimality condition in (44) implies

$$\begin{aligned} \mathbf{{x}}^{k+1}=\mathcal{{P}}_{\mathcal{{X}}}\left\{ \mathbf{{x}}^{k+1}-\Big [\xi _{\mathbf{{x}}}^{k+1} -A^\mathsf{{T}}\mathbf{{y}}^k+\big (rA^\mathsf{{T}}A+Q\big ) (\mathbf{{x}}^{k+1}-\mathbf{{x}}^k)\Big ]\right\} . \end{aligned}$$

Combine it with the definition of \({\text {dist}}_H(\cdot ,\cdot )\) and the property in (57) to obtain

$$\begin{aligned}{} & {} {\text {dist}}^2\left( \textbf{0},e_{\mathcal{{X}}}(\mathbf{{u}}^{k+1},1)\right) = {\text {dist}}^2\left( \mathbf{{x}}^{k+1}, \mathcal{{P}}_{\mathcal{{X}}}\left\{ \mathbf{{x}}^{k+1}-\big [\xi _{\mathbf{{x}}}^{k+1} -A^\mathsf{{T}}\mathbf{{y}}^{k+1}\big ]\right\} \right) \nonumber \\{} & {} \quad \le \Big \Vert A^\mathsf{{T}}(\mathbf{{y}}^k-\mathbf{{y}}^{k+1}) + \big (rA^\mathsf{{T}}A+Q\big ) (\mathbf{{x}}^k-\mathbf{{x}}^{k+1})\Big \Vert ^2 \nonumber \\{} & {} \quad \le 2\Big ( \big \Vert A^\mathsf{{T}}(\mathbf{{y}}^k-\mathbf{{y}}^{k+1})\big \Vert ^2+ \big \Vert \big (rA^\mathsf{{T}}A+Q\big ) (\mathbf{{x}}^k-\mathbf{{x}}^{k+1})\big \Vert ^2\Big )=2 \big \Vert \mathbf{{u}}^k-\mathbf{{u}}^{k+1}\big \Vert ^2_{\mathcal{{Q}}_1}, \end{aligned}$$
(62)

where \(\mathcal{{Q}}_1={\text {diag}}\left( (rA^\mathsf{{T}}A+Q)^\mathsf{{T}}(rA^\mathsf{{T}}A+Q),AA^\mathsf{{T}}\right) .\) Similarly, we have from (46) that

$$\begin{aligned} \mathbf{{y}}^{k+1}=\mathcal{{P}}_{\mathcal{{Y}}}\left\{ \mathbf{{y}}^{k+1}-\Big [\xi _{\mathbf{{y}}}^{k+1}+ A (2\mathbf{{x}}^{k+1}-\mathbf{{x}}^k)+\frac{1}{r} (\mathbf{{y}}^{k+1}-\mathbf{{y}}^k)\Big ]\right\} \end{aligned}$$

and

$$\begin{aligned}{} & {} {\text {dist}}^2\big (\textbf{0},e_{\mathcal{{Y}}}(\mathbf{{u}}^{k+1},1)\big ) = {\text {dist}}^2\Big (\mathbf{{y}}^{k+1}, \mathcal{{P}}_{\mathcal{{Y}}}\big \{\mathbf{{y}}^{k+1}-\big [\xi _{\mathbf{{y}}}^{k+1} +A \mathbf{{x}}^{k+1}\big ]\big \}\Big ) \nonumber \\{} & {} \quad \le \Big \Vert A (\mathbf{{x}}^k-\mathbf{{x}}^{k+1}) +\frac{1}{r} (\mathbf{{y}}^k-\mathbf{{y}}^{k+1})\Big \Vert ^2 \nonumber \\{} & {} \quad \le 2\Big ( \big \Vert A (\mathbf{{x}}^k-\mathbf{{x}}^{k+1})\big \Vert ^2+ \big \Vert \frac{1}{r} (\mathbf{{y}}^k-\mathbf{{y}}^{k+1})\big \Vert ^2\Big )=2 \big \Vert \mathbf{{u}}^k-\mathbf{{u}}^{k+1}\big \Vert ^2_{\mathcal{{Q}}_2}, \end{aligned}$$
(63)

where \(\mathcal{{Q}}_2={\text {diag}}\left( A^\mathsf{{T}}A,\frac{1}{r}\mathbf{{I}}\right) .\) The inequalities (62)–(63) immediately ensure (61) due to the relation \(\mathcal{{Q}}=\mathcal{{Q}}_1+\mathcal{{Q}}_2.\) \(\blacksquare \)

Based on Lemma 5.2 and the conclusion (49), we next provide a global linear convergence rate of N-PDHG1 with the aid of the notations \(\lambda _{\min }(H)\) and \( \lambda _{\max }(H)\) which denote the smallest and largest eigenvalue of the positive definite matrix H, respectively.

Theorem 5.1

Let \(\mathcal{{Q}}\) be given in (60). Then, there exists a constant \(\zeta >0\) such that the iterates generated by N-PDHG1 satisfy

$$\begin{aligned} {\text {dist}}^2_{H}(\mathbf{{u}}^{k+1}, \mathcal{{U}}^*)\le \frac{1}{1+\hat{\zeta } }{\text {dist}}^2_{H}(\mathbf{{u}}^k, \mathcal{{U}}^*), \end{aligned}$$
(64)

where the constant \( \hat{\zeta }= \frac{\lambda _{\min }(H)}{2\zeta ^2\lambda _{\max }(\mathcal{{Q}})\lambda _{\max }(H)}>0. \)

Proof Because \( \mathcal{{U}}^*\) is a closed convex set, there exists a \(\mathbf{{u}}^*_k\in \mathcal{{U}}^*\) satisfying

$$\begin{aligned} {\text {dist}}_{H}(\mathbf{{u}}^k, \mathcal{{U}}^*) =\big \Vert \mathbf{{u}}^k-\mathbf{{u}}^*_k\big \Vert _{H}. \end{aligned}$$
(65)

By the condition (59) and Lemma 5.2 there exists a constant \(\zeta >0\) such that

$$\begin{aligned} {\text {dist}}^2\big (\mathbf{{u}}^{k+1}, \mathcal{{U}}^*\big )\le 2\zeta ^2\big \Vert \mathbf{{u}}^k-\mathbf{{u}}^{k+1}\big \Vert _{\mathcal{{Q}}}^2\le \frac{2\zeta ^2\lambda _{\max }(\mathcal{{Q}})}{\lambda _{\min }(H)}\big \Vert \mathbf{{u}}^k -\mathbf{{u}}^{k+1}\big \Vert _{H}^2. \end{aligned}$$
(66)

By the definition of \({\text {dist}}_{H}(\cdot ,\cdot )\), we have

$$\begin{aligned} \frac{1}{\lambda _{\max }(H)}{\text {dist}}^2_H\big (\mathbf{{u}}^{k+1}, \mathcal{{U}}^*\big )\le {\text {dist}}^2\big (\mathbf{{u}}^{k+1}, \mathcal{{U}}^*\big ). \end{aligned}$$
(67)

Combine (66)–(67) and (49) to have

$$\begin{aligned}{} & {} {\text {dist}}^2_{H}(\mathbf{{u}}^{k+1}, \mathcal{{U}}^*)\le \big \Vert \mathbf{{u}}^{k+1}-\mathbf{{u}}^*_k\big \Vert _H^2 \\{} & {} \quad \le \big \Vert \mathbf{{u}}^k-\mathbf{{u}}^*_k\big \Vert _H^2 - \big \Vert \mathbf{{u}}^k-\mathbf{{u}}^{k+1}\big \Vert _H^2\\{} & {} \quad \le {{\text {dist}}^2_{H}}(\mathbf{{u}}^k, \mathcal{{U}}^*)-\frac{\lambda _{\min }(H)}{2\zeta ^2\lambda _{\max }(\mathcal{{Q}})\lambda _{\max }(H)} {\text {dist}}^2_H\big (\mathbf{{u}}^{k+1}, \mathcal{{U}}^*\big ). \end{aligned}$$

Rearranging the above inequality is to confirm (64). \(\blacksquare \)

Corollary 5.1

Let \(\hat{\zeta }>0\) be given in Theorem 5.1 and the sequence \(\{\mathbf{{u}}^k\}\) be generated by N-PDHG1. Then, there exists a point \(\mathbf{{u}}^\infty \in \mathcal{{U}}^*\) such that

$$\begin{aligned} \big \Vert \mathbf{{u}}^k-\mathbf{{u}}^\infty \big \Vert _{H}\le C \epsilon ^k, \end{aligned}$$
(68)

where

$$\begin{aligned} C=\frac{2{\text {dist}}_{H}(\mathbf{{u}}^0, \mathcal{{U}}^*)}{1-\epsilon }>0\quad \text{ and }\quad \epsilon =\frac{1}{\sqrt{1+\hat{\zeta }}}\in (0,1). \end{aligned}$$

Proof Let \({\mathbf{{u}}^*}\in \mathcal{{U}}^*\) such that (65) holds and let

$$\begin{aligned} \mathbf{{u}}^{k+1}=\mathbf{{u}}^{k}+\mathbf{{d}}^k. \end{aligned}$$
(69)

Then, it follows from (49) that \( \left\| \mathbf{{u}}^{k+1}-{\mathbf{{u}}^*}\right\| _{H}\le \left\| \mathbf{{u}}^k-{\mathbf{{u}}^*}\right\| _{H} \) which further implies

$$\begin{aligned} \big \Vert \mathbf{{d}}^k\big \Vert _H= & {} \big \Vert \mathbf{{u}}^{k+1}-\mathbf{{u}}^k\big \Vert _{H} \le \big \Vert \mathbf{{u}}^{k+1}-{\mathbf{{u}}^*}\big \Vert _{H}+ \big \Vert \mathbf{{u}}^k-{\mathbf{{u}}^*}\big \Vert _{H} \nonumber \\\le & {} 2\big \Vert \mathbf{{u}}^k-{\mathbf{{u}}^*}\big \Vert _{H} = 2{\text {dist}}_{H}(\mathbf{{u}}^k, \mathcal{{U}}^*)\nonumber \\\le & {} 2 \epsilon ^k {\text {dist}}_{H}\big (\mathbf{{u}}^0, \mathcal{{U}}^*\big ), \end{aligned}$$
(70)

where the final inequality follows from (64). Because the sequence \(\{\mathbf{{u}}^k\} \) generated by N-PDHG1 converges to a \( \mathbf{{u}}^\infty \in \mathcal{{U}}^*\), we have from (69) that \( \mathbf{{u}}^\infty =\mathbf{{u}}^k+\sum _{j=k}^{\infty }\mathbf{{d}}^j \), which by (70) indicates

$$\begin{aligned} \big \Vert \mathbf{{u}}^k-\mathbf{{u}}^\infty \big \Vert _{H}\le & {} \sum \limits _{j=k}^{\infty }\Vert \mathbf{{d}}^j\Vert _{H} \le 2{\text {dist}}_{H}(\mathbf{{u}}^0, \mathcal{{U}}^*)\sum \limits _{j=k}^{\infty }\epsilon ^{j} \\= & {} 2{\text {dist}}_{H}(\mathbf{{u}}^0, \mathcal{{U}}^*) \epsilon ^k \sum \limits _{j=0}^{\infty }\epsilon ^{j} \le \epsilon ^k \Big [2{\text {dist}}_{H}(\mathbf{{u}}^0, \mathcal{{U}}^*)\frac{1}{1-\epsilon }\Big ]. \end{aligned}$$

So, the inequality (68) holds, that is, \(\mathbf{{u}}^k\) converges \(\mathbf{{u}}^\infty \) R-linearly. \(\blacksquare \)

Remark 5.1

Consider the following general saddle-point problem

$$\begin{aligned} \min \limits _{\mathbf{{x}}\in \mathcal{{X}}}\max \limits _{\mathbf{{y}}\in \mathcal{{Y}}} \Phi (\mathbf{{x,y}}):= f(\mathbf{{x}})+\theta _1(\mathbf{{x}})-\mathbf{{y}}^\mathsf{{T}}A\mathbf{{x}}-\theta _2(\mathbf{{y}}), \end{aligned}$$

or, equivalently, the composite problem \( \min \limits _{\mathbf{{x}}\in \mathcal{{X}}} \big \{f(\mathbf{{x}}) + \theta _1(\mathbf{{x}}) + \theta _2^*(-A\mathbf{{x}})\big \},\) where \(f(\mathbf{{x}}): {\mathcal{{X}}}\rightarrow \mathcal{{R}}\) is a smooth convex function and its gradient is Lipschitz continuous with constant \(L_f\), and the remaining notations have the same meanings as before. For this problem, similar to the previous case 2 in Sect. 2.3 we can develop the following iterative scheme

$$\begin{aligned} \left\{ \begin{array}{lllll} \mathbf{{x}}^{k+1}=\arg \min \limits _{\mathbf{{x}}\in \mathcal{{X}}} \theta _1(\mathbf{{x}}) + \big \langle \nabla f(\mathbf{{x}}^k)-A^\mathsf{{T}}\mathbf{{y}}^k, \mathbf{{x}}\big \rangle + \frac{1}{2}\left\| \mathbf{{x}}-\mathbf{{x}}^{k}\right\| ^2_{rA^\mathsf{{T}}A+ Q},\\ \mathbf{{y}}^{k+1}=\arg \max \limits _{\mathbf{{y}}\in \mathcal{{Y}}} \Phi (2\mathbf{{x}}^{k+1}-\mathbf{{x}}^k,\mathbf{{y}})-\frac{1}{2r}\Vert \mathbf{{y}}-\mathbf{{y}}^k\Vert ^2. \end{array}\right. \end{aligned}$$

Its global convergence and linear convergence rate can be also established by the above analysis.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bai, J., Jia, L. & Peng, Z. A New Insight on Augmented Lagrangian Method with Applications in Machine Learning. J Sci Comput 99, 53 (2024). https://doi.org/10.1007/s10915-024-02518-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10915-024-02518-0

Keywords

Mathematics Subject Classification

Navigation