Skip to main content
Log in

An \(O(s^r)\)-resolution ODE framework for understanding discrete-time algorithms and applications to the linear convergence of minimax problems

  • Full Length Paper
  • Series A
  • Published:
Mathematical Programming Submit manuscript

Abstract

There has been a long history of using ordinary differential equations (ODEs) to understand the dynamics of discrete-time algorithms (DTAs). Surprisingly, there are still two fundamental and unanswered questions: (i) it is unclear how to obtain a suitable ODE from a given DTA, and (ii) it is unclear the connection between the convergence of a DTA and its corresponding ODEs. In this paper, we propose a new machinery—an \(O(s^r)\)-resolution ODE framework—for analyzing the behavior of a generic DTA, which (partially) answers the above two questions. The framework contains three steps: 1. To obtain a suitable ODE from a given DTA, we define a hierarchy of \(O(s^r)\)-resolution ODEs of a DTA parameterized by the degree r, where s is the step-size of the DTA. We present a principal approach to construct the unique \(O(s^r)\)-resolution ODEs from a DTA; 2. To analyze the resulting ODE, we propose the \(O(s^r)\)-linear-convergence condition of a DTA with respect to an energy function, under which the \(O(s^r)\)-resolution ODE converges linearly to an optimal solution; 3. To bridge the convergence properties of a DTA and its corresponding ODEs, we define the properness of an energy function and show that the linear convergence of the \(O(s^r)\)-resolution ODE with respect to a proper energy function can automatically guarantee the linear convergence of the DTA. To better illustrate this machinery, we utilize it to study three classic algorithms—gradient descent ascent (GDA), proximal point method (PPM) and extra-gradient method (EGM)—for solving the unconstrained minimax problem \(\min _{x\in \mathbb {R}^n} \max _{y\in \mathbb {R}^m} L(x,y)\). Their O(s)-resolution ODEs explain the puzzling convergent/divergent behaviors of GDA, PPM and EGM when L(xy) is a bilinear function, and showcase that the interaction terms help the convergence of PPM/EGM but hurts the convergence of GDA. Furthermore, their O(s)-linear-convergence conditions not only unify the known scenarios when PPM and EGM have linear convergence, but also showcase that these two algorithms exhibit linear convergence in much broader contexts, including when solving a class of nonconvex-nonconcave minimax problems. Finally, we show how this ODE framework can help design new optimization algorithms for minimax problems, by studying the difference between the O(s)-resolution ODE of GDA and that of PPM/EGM.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Notes

  1. Recall that the o notation in Eq. (9) means \(\lim _{s\rightarrow 0} \frac{\Vert Z(s)-z^+\Vert }{s^{r+1}}=0\).

  2. Recall that the \(O(\cdot )\) notation in Eq. (17) is equivalent to that there exists a constant C such that \(\lim _{s\rightarrow 0} \frac{\Vert Z(s)-z^+\Vert }{s^{r+2}}\le C\).

  3. This type of decaying rate is called “exponential rate” in ODE literature. We here use the terminology “linear rate” in order to be consistent with the linear convergence in optimization literature.

  4. Recall the \(\Omega \) notation means that there exists a constant \(c>0\) such that \(\Vert Z(s)-z^+\Vert \ge c\Vert s^{j+1} f_j(z^*)\Vert \) as \(s\rightarrow 0\).

  5. Suppose \(\mathcal {A},\mathcal {B}\) are two linear subspaces in \(\mathbb {R}^m\), then \(\cos (\mathcal {A},\mathcal {B}):=\min _{a\in \mathcal {A}, b\in \mathcal {B}} \cos (a,b)\), and \(\sin (\mathcal {A},\mathcal {B})=\sqrt{1-\cos ^2(\mathcal {A},\mathcal {B})}\).

References

  1. Bach, F., Mairal, J., Ponce, J.: Convex sparse matrix factorizations. arXiv:0812.1869 (2008)

  2. Bauschke, H.H., Combettes, P.: Convex Analysis and Monotone Operator Theory in Hilbert Spaces, vol. 408. Springer, Berlin (2011)

    Book  Google Scholar 

  3. Ben-Tal, A., El Ghaoui, L., Nemirovski, A.: Robust Optimization, vol. 28. Princeton University Press (2009)

  4. Bertsimas, D., Brown, D.B., Caramanis, C.: Theory and applications of robust optimization. SIAM Rev. 53(3), 464–501 (2011)

    Article  MathSciNet  Google Scholar 

  5. Blume, L.E.: The statistical mechanics of strategic interaction. Games Econ. Behav. 5(3), 387–424 (1993)

    Article  MathSciNet  Google Scholar 

  6. Chambolle, A., Pock, T.: A first-order primal-dual algorithm for convex problems with applications to imaging. J. Math. Imag. Vis. 40(1), 120–145 (2011)

    Article  MathSciNet  Google Scholar 

  7. Daskalakis, C., Ilyas, A., Syrgkanis, V., Zeng, H.: Training gans with optimism. In: International Conference on Learning Representations (2018)

  8. Douglas, J., Rachford, H.H.: On the numerical solution of heat conduction problems in two and three space variables. Trans. Am. Math. Soc. 82(2), 421–439 (1956)

    Article  MathSciNet  Google Scholar 

  9. Du, S. S., Chen, J., Li, L., Xiao, L., Zhou, D.: Stochastic variance reduction methods for policy evaluation. In: International Conference on Machine Learning (2017)

  10. Du, S. S., Hu, W.: Linear convergence of the primal-dual gradient method for convex-concave saddle point problems without strong convexity. In: International Conference on Artificial Intelligence and Statistics (2019)

  11. Eckstein, J., Bertsekas, D.P.: On the Douglas–Rachford splitting method and the proximal point algorithm for maximal monotone operators. Math. Program. 55(1–3), 293–318 (1992)

    Article  MathSciNet  Google Scholar 

  12. Fiori, S.: Quasi-geodesic neural learning algorithms over the orthogonal group: a tutorial. J. Mach. Learn. Res. 6(May), 743–781 (2005)

    MathSciNet  MATH  Google Scholar 

  13. Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial nets. In: Advances in Neural Information Processing Systems (2014)

  14. Grimmer, B., Lu, H., Worah, P., Mirrokni, V.: The landscape of nonconvex-nonconcave minimax optimization. arXiv:2006.08667 (2020)

  15. Hast, M., Åström, K. J., Bernhardsson, B., Boyd, S.: Pid design by convex-concave optimization. In: 2013 European Control Conference, IEEE (2013)

  16. Helmke, U., Moore, J.B.: Optimization and Dynamical Systems. Springer, Berlin (2012)

    MATH  Google Scholar 

  17. Liang, T., Stokes, J.: Interaction matters: a note on non-asymptotic local convergence of generative adversarial networks. In: International Conference on Artificial Intelligence and Statistics (2019)

  18. Mokhtari, A., Ozdaglar, A., Pattathil, S.: A unified analysis of extra-gradient and optimistic gradient methods for saddle point problems: proximal point approach. In: International Conference on Artificial Intelligence and Statistics (2020)

  19. Monteiro, R.D.C., Svaiter, B.F.: On the complexity of the hybrid proximal extragradient method for the iterates and the ergodic mean. SIAM J. Optim. 20(6), 2755–2787 (2010)

    Article  MathSciNet  Google Scholar 

  20. Nemirovski, A.: Prox-method with rate of convergence o (1/t) for variational inequalities with lipschitz continuous monotone operators and smooth convex-concave saddle point problems. SIAM J. Optim. 15(1), 229–251 (2004)

    Article  MathSciNet  Google Scholar 

  21. Nesterov, Y.: A method of solving a convex programming problem with convergence rate \({O}(1/k^2)\). Soviet Math. Doklady 27, 372–376 (1983)

    MATH  Google Scholar 

  22. Nesterov, Y.: Introductory Lectures on Convex Optimization: a Basic Course. Kluwer Academic Publishers, Boston (2003)

    MATH  Google Scholar 

  23. Nesterov, Y.: Smooth minimization of non-smooth functions. Math. Program. 103(1), 127–152 (2005)

    Article  MathSciNet  Google Scholar 

  24. O’Connor, D., Vandenberghe, L.: On the equivalence of the primal-dual hybrid gradient method and Douglas–Rachford splitting. Mathematical Programming 1–24 (2018)

  25. Pearlmutter, B.A.: Fast exact multiplication by the hessian. Neural Comput. 6(1), 147–160 (1994)

    Article  Google Scholar 

  26. Pedlosky, J.: Geophysical Fluid Dynamics. Springer, Berlin (2013)

    MATH  Google Scholar 

  27. Rockafellar, R.T.: Convex Analysis. Princeton University Press, Princeton (1970)

    Book  Google Scholar 

  28. Rockafellar, R.T.: Monotone operators and the proximal point algorithm. SIAM J. Control. Optim. 14(5), 877–898 (1976)

    Article  MathSciNet  Google Scholar 

  29. Schropp, J., Singer, I.: A dynamical systems approach to constrained minimization. Numer. Funct. Anal. Optim. 21(3–4), 537–551 (2000)

    Article  MathSciNet  Google Scholar 

  30. Shi, B., Du, S. S., Jordan, M. I., Su, W. J.: Understanding the acceleration phenomenon via high-resolution differential equations. arXiv:1810.08907 (2018)

  31. Su, W., Boyd, S., Candes, E.J.: A differential equation for modeling Nesterov’s accelerated gradient method: theory and insights. J. Mach. Learn. Res. 17(153), 1–43 (2016)

  32. Tseng, P.: On linear convergence of iterative methods for the variational inequality problem. J. Comput. Appl. Math. 60(1–2), 237–252 (1995)

    Article  MathSciNet  Google Scholar 

  33. Wang, J., Xiao, L.: Exploiting strong convexity from data with primal-dual first-order algorithms. In: International Conference on Machine Learning (2017)

  34. Weinan, E.: Principles of Multiscale Modeling. Cambridge University Press, Cambridge (2011)

    MATH  Google Scholar 

  35. Wibisono, A., Wilson, A.C., Jordan, M.I.: A variational perspective on accelerated methods in optimization. Proce. Natl. Acad. Sci. 113(47), E7351–E7358 (2016)

    MathSciNet  MATH  Google Scholar 

  36. Wilson, A. C, Recht, B., Jordan, M. I.: A lyapunov analysis of momentum methods in optimization. arXiv:1611.02635 (2016)

  37. Zhang, Y., Xiao, L.: Stochastic primal-dual coordinate method for regularized empirical risk minimization. J. Mach. Learn. Res. 18(1), 2939–2980 (2017)

    MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

The author would like to express his gratitude to Robert M. Freund for reading an early version of the paper and for thoughtful discussions that helped to position the paper. The author also wishes to thank Renbo Zhao, Ben Grimmer, Miles Lubin, Oliver Hinder and David Applegate for helpful discussions. The author would like to thank the anonymous referees and the associate editor for the constructive feedback, which results in a significantly improved version of the manuscript.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Haihao Lu.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix

1.1 O(s)-Linear-convergence condition of \(L(x,y)= f(C_1 x) + x^T B y - g(C_2 y)\)

Proposition 4

Consider \(L(x,y)= f(C_1 x) + x^T B y - g(C_2 y)\). Define

$$\begin{aligned} a_1=\left\{ \begin{array}{cl} \min \left( {\mu \lambda _{\min }^+ (C_1^T C_1), s\lambda _{\min }^+ (B B^T)}\right) &{} \ if \ \sin \left( {Range (B), Range (C_1^T)}\right) = 0 \\ \min \left( {\mu \lambda _{\min }^+ (C_1^T C_1)\sin ^2\left( {Range (B), Range (C_1^T)}\right) , s\lambda _{\min }^+ (B B^T)}\right) &{} \ otherwise \ , \end{array} \right. \end{aligned}$$

and

$$\begin{aligned} a_2=\left\{ \begin{array}{cl} \min \left( {\mu \lambda _{\min }^+ (C_2^T C_2), s\lambda _{\min }^+ (B^T B)}\right) &{} \ if \ \sin \left( {Range (B^T), Range (C_2^T)}\right) = 0 \\ \min \left( {\mu \lambda _{\min }^+ (C_2^T C_2)\sin ^2\left( {Range (B^T), Range (C_2^T)}\right) , s\lambda _{\min }^+ (B^T B)}\right) &{} \ otherwise \ \ , \end{array} \right. \end{aligned}$$

where \(\sin (\cdot , \cdot )\) is the cosine angle between two linear spacesFootnote 5 Then L(xy) satisfies the O(s)-linear-convergence condition with \(\rho (s) \ge \min \{a_1,a_2\}>0\).

Proof

Suppose it holds for any \(x\in \text {Range}(C_1^T) + \text {Range}(B)\) that

$$\begin{aligned} x^T \left( {\nabla _{xx}L(x,y)+ s \nabla _{xy}L(x,y)\nabla _{xy}L(x,y)^T}\right) x \ge a_1 \Vert x\Vert ^2 \ , \end{aligned}$$
(57)

then symmetrically for any \(y\in \text {Range}(C_2^T) + \text {Range}(B^T)\) it holds that

$$\begin{aligned} y^T \left( {\nabla _{yy}L(x,y)+ s \nabla _{xy}L(x,y)^T\nabla _{xy}L(x,y)}\right) y \ge a_2\Vert y\Vert ^2 \ , \end{aligned}$$

which proves (42) with \(\rho (s)=\min \{a_1,a_2\}>0\) by noticing \(\mathbb {F}\subseteq \left( {\text {Range}(C_1^T) + \text {Range}(B)}\right) \times \left( {\text {Range}(C_2^T) + \text {Range}(B^T)}\right) \). Now let us prove (57). First, notice that \(\nabla _{xx}L(x,y)\succeq \mu C_1^T C_1\) and \(\nabla _{xy}L(x,y)=B\), thus we just need to show

$$\begin{aligned} x^T \left( {\mu C_1^T C_1+ s BB^T}\right) x \ge a_1 \Vert x\Vert ^2 \ . \end{aligned}$$
(58)

If \(\sin \left( B B^T, C_1^T C_1\right) =0\), then either \(x\in \text {Range}(C_1^T)\) thus \(x^T \left( {\mu C_1^T C_1+ s BB^T}\right) x \ge \mu \lambda _{\min }^+ (C_1^T C_1)\Vert x\Vert ^2\), or \(x\in \text {Range}(B)\) thus \(x^T \left( {\mu C_1^T C_1+ s BB^T}\right) x \ge s \lambda _{\min }^+ (BB^T)\Vert x\Vert ^2\). In either case (58) holds.

If \(\sin \left( B B^T, C_1^T C_1\right) \not =0\), suppose \(x=x_1+x_2\) where \(x_1\in \text {Range}(B^T)\) and \(x_2\in \text {Range}(C_1^T)\). It is obvious that (58) holds if \(x_2 = 0\). Now define \(P_{B^T}(x)=B(BB^T)^{+}B^Tx\) as the projection operator onto \(\text {Range}(B)\), and \(P_{B^T}^T(x)=x-P_{B^T}(x)\) be the projection operator onto the perpendicular space of \(\text {Range}(B)\), then it holds that

$$\begin{aligned} \begin{aligned}&x^T \left( {\mu C_1^T C_1+ s BB^T}\right) x \\&\quad =\, (x_1+P_{B^T}(x_2)+P_{B^T}^T(x_2))^T \left( {\mu C_1^T C_1+ s BB^T}\right) (x_1+P_{B^T}(x_2)+P_{B^T}^T(x_2)) \\&\quad = \, (x_1+P_{B^T}(x_2))^T \left( {\mu C_1^T C_1+ s BB^T}\right) (x_1+P_{B^T}(x_2)) + \mu (P_{B^T}^T(x_2))^T C_1^T C_1 P_{B^T}^T(x_2) \\&\quad \ge \, (x_1+P_{B^T}(x_2))^T \left( {s BB^T}\right) (x_1+P_{B^T}(x_2)) + \mu (P_{C_1}(P_{B^T}^T(x_2)))^T C_1^T C_1 P_{C_1}(P_{B^T}^T(x_2)) \\&\quad \ge \, s\lambda _{\min }^+(BB^T)\Vert x_1+P_{B^T}(x_2)\Vert ^2 + \mu \lambda _{\min }^+(C_1^T C_1) \Vert (P_{C_1}(P_{B^T}^T(x_2)))\Vert ^2 \\&\quad \ge \, a_1\Vert x_1+P_{B^T}(x_2)\Vert ^2 + \mu \lambda _{\min }^+(C_1^T C_1) \sin ^2\left( {\text {Range}(B), \text {Range}(C_1^T)}\right) \Vert P_{B^T}^T(x_2)\Vert ^2 \\&\quad \ge \, a_1\Vert x_1+P_{B^T}(x_2)\Vert ^2 + a_1 \Vert P_{B^T}^T(x_2)\Vert ^2 \\&\quad = \, a_1 \Vert x\Vert ^2, \end{aligned} \end{aligned}$$

where the second equality uses \(B^T P_{B^T}^T(x_2)=0\), the first inequality is from \((x_1+P_{B^T}(x_2))^T \left( {\mu C_1^T C_1}\right) (x_1+P_{B^T}(x_2))\ge 0\) and \(C_1 P_{C_1}^T(P_{B^T}^T(x_2))=0\), the second inequality is because \(x_1+P_{B^T}(x_2)\in \text {Range}(B^T)\) and \(P_{C_1}(P_{B^T}^T(x_2))\in \text {Range}(C_1^T)\), the third inequality uses the definition of \(a_1\) and the definition of \(\cos \) between two space, the fourth inequality is due to the definition of \(a_1\), and the last equality is from \(x_1+P_{B^T}(x_2)\in \text {Range}(B^T)\) and \(P_{B^T}^T(x_2)\perp \text {Range}(B^T)\). This finishes the proof. \(\square \)

Taylor expansion of operator \((I+sF)^{-1}\)

Here we derive the third order Taylor expansion of operator \((I+sF)^{-1}\) as stated in (25). Suppose \((I+sF)^{-1}(z)=g_0(z)+g_1(z)s+g_2(z)s^2+g_3(z) s^3 + o(s^3)\), then it holds that

$$\begin{aligned} \begin{aligned} z&=(I+sF)(g_0(z)+g_1(z)s+g_2(z)s^2+g_3(z) s^3) + o(s^3)\\&= g_0(z)+g_1(z)s+g_2(z)s^2+g_3(z) s^3+ sF(g_0(z)+g_1(z)s+g_2(z)s^2)+ o(s^3)\ . \end{aligned} \end{aligned}$$
(59)

By comparing the O(1) term in both sides of (59), we have \(g_0(z)=z\). By comparing the O(s) term in both sides of (59), we have

$$\begin{aligned} 0=g_1(z)+F(g_0(z))=g_1(z)+F(z), \end{aligned}$$

thus \(g_1(z)=-F(z)\). Notice \(F(g_0(z)+sg_1(z))=F(z-sF(z))=F(z)-s\nabla F(z) F(z)+o(s)\). By comparing the \(O(s^2)\) term in both side of (59), we have

$$\begin{aligned} 0=g_2(z)-\nabla F(z) F(z), \end{aligned}$$

thus \(g_2(z)=\nabla F(z) F(z)\). Notice

$$\begin{aligned} \begin{aligned}&F(g_0(z)+g_1(z)s+ g_2(z)s^2)\\&=\,F(z-sF(z)+s^2 \nabla F(z) F(z))\\&=\,F(z) +\nabla F(z)( -sF(z)+s^2\nabla F(z) F(z)) + \frac{1}{2}\nabla ^2 F(z) (sF(z), sF(z)) + o(s^2) \\&=\,F(z)-s\nabla F(z) F(z) + s^2 \left( {(\nabla F(z))^2 F(z) + \frac{1}{2}\nabla ^2 F(z) (F(z), F(z))}\right) + o(s^2) \ . \end{aligned} \end{aligned}$$

By comparing the \(O(s^3)\) term in both sides of (59), we have

$$\begin{aligned} 0=g_3(z)+(\nabla F(z))^2 F(z) +\frac{1}{2} \nabla ^2 F(z) (F(z), F(z)) \ , \end{aligned}$$

thus \(g_3(z)=-(\nabla F(z))^2 F(z) -\frac{1}{2} \nabla ^2 F(z) (F(z), F(z))\), which yield (25).

Generalized block skew-symmetric matrix and its basic properties

Here is the definition of a generalized block skew-symmetric matrix:

Definition 6

We say a matrix \(M\in \mathbb {R}^{(n+m)\times (n+m)}\) is generalized block skew-symmetric if M has the structure: \(M=\left[ \begin{matrix} A &{} B\\ -B^T &{} C \end{matrix}\right] \) where \(A\in \mathbb {R}^{n\times n}, C\in \mathbb {R}^{m\times m}\) are symmetric matrices and \(B\in \mathbb {R}^{n\times m}\) is an arbitrary matrix.

Remark 9

Going back to the minimax problem, \(\nabla F(z)=\left[ \begin{matrix} \nabla _{xx}L(x,y) &{} \nabla _{xy}L(x,y)\\ -\nabla _{xy}L(x,y)^T &{} \nabla _{yy}L(x,y) \end{matrix}\right] \) is a generalized block skew-symmetric matrix for any z.

Let \(M=\left[ \begin{matrix} A &{} B\\ -B^T &{} C \end{matrix}\right] \) be a generalized symmetric matrix. Denote \(M^i=\left[ \begin{matrix}{M^{i}_{11}} &{} {M^{i}_{12}}\\ {M^{i}_{21}} &{} {M^{i}_{22}} \end{matrix}\right] \) as the ith power of matrix M, where \({M^{i}_{jl}}\) for \(j,l\in \{1,2\}\) is the corresponding block of \(M^i\). In particular, we define \(M^0\) to be the identity matrix. The next proposition shows that \(M^i\) keeps the generalized block skew-symmetricity.

Proposition 5

Suppose M is a generalized block skew-symmetric matrix, then for any positive integer i, \(M^i\) is a generalized block skew-symmetric matrix.

Proof

We’ll prove the Proposition 5 by induction. First notice that Proposition 5 is satisfied with \(i=1\). Now suppose Proposition 5 is satisfied with i. Notice that

$$\begin{aligned} M^{i+1}=M M^{i} =M^{i} M \ , \end{aligned}$$
(60)

which yield the following update by matrix multiplication rules:

$$\begin{aligned} \begin{aligned} {M^{i+1}_{11}}&=A{M^{i}_{11}}+B{M^{i}_{21}}={M^{i}_{11}}A-{M^{i}_{12}}B^T, \\ {M^{i+1}_{12}}&=A{M^{i}_{12}}+B{M^{i}_{22}}={M^{i}_{11}}B+{M^{i}_{12}}C,\\ {M^{i+1}_{21}}&=-B^T{M^{i}_{11}}+C{M^{i}_{21}}={M^{i}_{21}}A-{M^{i}_{22}}B^T ,\\ {M^{i+1}_{22}}&=-B^T{M^{i}_{12}}+C{M^{i}_{22}}={M^{i}_{21}}B+{M^{i}_{22}}C. \end{aligned} \end{aligned}$$
(61)

Therefore,

$$\begin{aligned} {M^{i+1}_{11}}= & {} \frac{1}{2}\left( A{M^{i}_{11}}+B{M^{i}_{21}}+{M^{i}_{11}}A-{M^{i}_{12}}B^T\right) \\= & {} \frac{1}{2}\left( \left( {A{M^{i}_{11}}+B{M^{i}_{21}}}\right) + \left( {A{M^{i}_{11}}+B{M^{i}_{21}}}\right) ^T\right) \end{aligned}$$

is symmetric. Similarly, we have \({M^{i+1}_{22}}\) is symmetric. Meanwhile, it holds that

$$\begin{aligned} {M^{i+1}_{12}}=A{M^{i}_{12}}+B{M^{i}_{22}}=-\left( {M^{i}_{21}}A-{M^{i}_{22}}B^T\right) ^T = -\left( {{M^{i+1}_{21}}}\right) ^T \ , \end{aligned}$$

which finishes the proof for (i) by induction. \(\square \)

The next proposition provides upper and lower bounds on \({M^{i}_{11}}\) and \({M^{i}_{22}}\):

Proposition 6

Suppose M is a generalized block skew-symmetric matrix, and \(\Vert M\Vert \le \gamma \), then it holds for \(i\ge 3\) that

$$\begin{aligned} -(i-1)\gamma ^{i-2}(\gamma A+ BB^T) \preceq {M^{i}_{11}}\preceq (i-1)\gamma ^{i-2}(\gamma A+ BB^T) \ , \end{aligned}$$
(62)

and

$$\begin{aligned} -(i-1)\gamma ^{i-2}(\gamma C+ B^TB) \preceq {M^{i}_{22}}\preceq (i-1)\gamma ^{i-2}(\gamma C+ B^TB) \ . \end{aligned}$$
(63)

Furthermore, it holds for any integer \(i\ge 3\) and \(c\in \mathbb {R}^{m+n}\) that

$$\begin{aligned} \left| c^T M^i c\right| \le (i-1)\gamma ^{i-2} c^T \left[ \begin{matrix} \gamma A+ BB^T &{} 0\\ 0 &{} \gamma C+ B^T B \end{matrix}\right] c \ . \end{aligned}$$

The following two facts will be needed for the proof of Proposition 6.

Fact 1

Suppose \(S_1\) and \(S_2\) are symmetric matrices, then

$$\begin{aligned} -(S_1^2 + S_2^2)\preceq S_1S_2+S_2S_1 \preceq S_1^2 + S_2^2 \ . \end{aligned}$$

Proof

It is easy to check that

$$\begin{aligned} S_1^2 + S_2^2 - S_1S_2+S_2S_1 = (S_1-S_2)^T(S_1-S_2) \succeq 0 , \end{aligned}$$

and

$$\begin{aligned} S_1^2 + S_2^2 + S_1S_2+S_2S_1 = (S_1+S_2)^T(S_1+S_2) \succeq 0 , \end{aligned}$$

which finishes the proof by rearranging the above two matrix inequalities. \(\square \)

Fact 2

Suppose M is a generalized block skew-symmetric matrix, then

$$\begin{aligned} {M^{i}_{11}}=A{M^{i-2}_{11}}A-B{M^{i-2}_{22}} B^T - \left( {\sum _{j=0}^{i-3} B{M^{j}_{22}}B^T A^{i-2-j} + A^{i-2-j} B{M^{j}_{22}}B^T}\right) \ . \end{aligned}$$
(64)

Proof

By recursively using the update rule (61) and rearranging the equality, it holds that:

$$\begin{aligned} \begin{aligned} {M^{i}_{11}}&= A {M^{i-1}_{11}} + B{M^{i-1}_{21}} \\&= A({M^{i-2}_{11}}A-A{M^{i-2}_{12}}B^T) + B({M^{i-2}_{21}}A-{M^{i-2}_{22}}B^T) \\&= A{M^{i-2}_{11}}A-B{M^{i-2}_{22}} B^T + \left( {B{M^{i-2}_{21}}A - A{M^{i-2}_{12}}B^T}\right) \\&= A{M^{i-2}_{11}}A-B{M^{i-2}_{22}} B^T + \left( {B{M^{i-3}_{21}}A^2 - A^2{M^{i-3}_{12}}B^T}\right) \\&\quad - \left( {B{M^{i-3}_{22}}B^T A + A B{M^{i-3}_{22}}B^T}\right) \\&= \cdots \\&= A{M^{i-2}_{11}}A-B{M^{i-2}_{22}} B^T + \left( {BB^T A^{i-2} + A^{i-2} BB^T}\right) \\&\quad - \left( {\sum _{j=1}^{i-3} B{M^{j}_{22}}B^T A^{i-2-j} + A^{i-2-j} B{M^{j}_{22}}B^T}\right) \\&= A{M^{i-2}_{11}}A-B{M^{i-2}_{22}} B^T - \left( {\sum _{j=0}^{i-3} B{M^{j}_{22}}B^T A^{i-2-j} + A^{i-2-j} B{M^{j}_{22}}B^T}\right) \ . \end{aligned}{} \end{aligned}$$

\(\square \)

Now let us go back to the proof of Proposition 6.

Proof of Proposition 6

Notice that A is positive semi-definite and \(\Vert M\Vert =\gamma \), thus \(\Vert A\Vert \le \gamma \) and \(\Vert {M^{i-2}_{11}}\Vert \le \gamma ^{i-2}\), whereby \(A^{1/2}{M^{i-2}_{11}}A^{1/2}\preceq \gamma ^{i-1}I\). Therefore, it holds that

$$\begin{aligned} 0\preceq \tfrac{1}{\gamma ^i} A{M^{i-2}_{11}}A = \tfrac{1}{\gamma ^i} A^{1/2}\left( A^{1/2}{M^{i-2}_{11}}A^{1/2}\right) A^{1/2} \preceq \tfrac{1}{\gamma } A\ . \end{aligned}$$
(65)

Notice that \({M^{i-2}_{22}}\preceq \gamma ^{i-2}I\), thus it holds that

$$\begin{aligned} 0\preceq \tfrac{1}{\gamma ^i} B{M^{i-2}_{22}}B^T = \tfrac{1}{\gamma ^i} B{M^{i-2}_{22}}B^T \preceq \tfrac{1}{\gamma ^{2}} BB^T\ . \end{aligned}$$
(66)

For any \(0\le j\le i-3\), we have from Fact 1 by choosing \(S_1=\tfrac{1}{\gamma ^{2+j}} B{M^{j}_{22}}B^T\) and \(S_2=\tfrac{1}{\gamma ^{i-j-2}} A^{i-j-2}\) that

$$\begin{aligned} \begin{aligned}&\ \tfrac{1}{\gamma ^i} B{M^{j}_{22}}B^T A^{i-2-j} + \tfrac{1}{\gamma ^i} A^{i-2-j} B{M^{j}_{22}}B^T \\&\quad \preceq \ \left( \tfrac{1}{\gamma ^{2+j}} B{M^{j}_{22}}B^T\right) ^2 + \left( \tfrac{1}{\gamma ^{i-j-2}} A^{i-j-2}\right) ^2 \\&\quad = \ \tfrac{1}{\gamma ^{2j+4}} B\left( {M^{j}_{22}}B^T B {M^{j}_{22}}\right) B^T + \tfrac{1}{\gamma ^{2i-2j-4}} A^{1/2} A^{2i-2j-3} A^{1/2} \\&\quad \preceq \ \tfrac{1}{\gamma ^{2}} B B^T + \tfrac{1}{\gamma } A\ , \end{aligned} \end{aligned}$$
(67)

where the second matrix inequality is because \(B^TB\preceq \gamma ^2 I\), \({M^{j}_{22}}\preceq \gamma ^j I\) and \(A\preceq \gamma I\). Similarly, it holds that

$$\begin{aligned} \tfrac{1}{\gamma ^i} B{M^{j}_{22}}B^T A^{i-2-j} + \tfrac{1}{\gamma ^i} A^{i-2-j} B{M^{j}_{22}}B^T \succeq -\tfrac{1}{\gamma ^{2}} B B^T -\tfrac{1}{\gamma } A . \end{aligned}$$
(68)

Substituting (65), (66), (67) and (68) into (64) yields

$$\begin{aligned} \begin{aligned} \tfrac{1}{\gamma ^i} {M^{i}_{11}}&= \tfrac{1}{\gamma ^i}\left( A{M^{i-2}_{11}}A-B{M^{i-2}_{22}} B^T - \left( {\sum _{j=0}^{i-3} B{M^{j}_{22}}B^T A^{i-2-j} + A^{i-2-j} B{M^{j}_{22}}B^T}\right) \right) \\&\preceq \left( { \tfrac{1}{\gamma } A + \tfrac{1}{\gamma ^2} BB^T + (i-2)(\tfrac{1}{\gamma } A+\tfrac{1}{\gamma ^{2}} BB^T)}\right) \\&= (i-1)(\tfrac{1}{\gamma } A+\tfrac{1}{\gamma ^{2}} BB^T) \ , \end{aligned} \end{aligned}$$
(69)

and

$$\begin{aligned} \begin{aligned} \tfrac{1}{\gamma ^i} {M^{i}_{11}}&= \tfrac{1}{\gamma ^i}\left( A{M^{i-2}_{11}}A-B{M^{i-2}_{22}} B^T - \left( {\sum _{j=0}^{i-3} B{M^{j}_{22}}B^T A^{i-2-j} + A^{i-2-j} B{M^{j}_{22}}B^T}\right) \right) \\&\succeq \left( {-\tfrac{1}{\gamma } A -\tfrac{1}{\gamma } BB^T - (i-2)(\tfrac{1}{\gamma } A+\tfrac{1}{\gamma ^{2}} BB^T)}\right) \\&= -(i-1)(\tfrac{1}{\gamma } A+\tfrac{1}{\gamma ^{2}} BB^T) \ . \end{aligned} \end{aligned}$$
(70)

which furnishes the proof of (62). The proof of (63) can be obtained symmetrically. Furthermore, it follows from Proposition 5 that \(M^i\) is generalized block skew-symmetric, thus

$$\begin{aligned} \left| c^T M^i c\right| = \left| c^T \left[ \begin{matrix} {M^{i}_{11}} &{}\quad 0\\ 0 &{}\quad {M^{i}_{22}} \end{matrix}\right] c \right| \le (i-1)\gamma ^{i-2} c^T \left[ \begin{matrix} \gamma A+ BB^T &{}\quad 0\\ 0 &{}\quad \gamma C+ B^T B \end{matrix}\right] c,\nonumber \\ \end{aligned}$$
(71)

which finishes the proof of Proposition 5. \(\square \)

Proofs in Sect. 5

1.1 Proof of Theorem 4

The following two propositions will be needed for the proof of Theorem 4.

Proposition 7

For given z and \(\hat{z}\), let \(M=\int _{0}^1 \nabla F(z+t (\hat{z}-z))dt\), then \(F(\hat{z})-F(z)=M(\hat{z}-z)\).

Proof

Let \(\phi (t)=F(z+t (\hat{z}-z))dt\), then \(\phi (0)=F(z)\), \(\phi (1)=F(\hat{z})\) and \(\phi '(t)=\nabla F(z+t(\hat{z}-z))(\hat{z}-z)\). From the fundamental theorem of calculus, we have

$$\begin{aligned} F(\hat{z})-F(z)=\phi (1)-\phi (0)=\int _{0}^1 \phi '(t)dt = \int _{0}^1 \nabla F(z+t(\hat{z}-z))(\hat{z}-z)dt = M(\hat{z}-z)\ . \end{aligned}$$

\(\square \)

Proposition 8

Consider PPM with iterate update (5) and step-size \(s\le \frac{1}{3\gamma }\), then for any iteration k, it holds that

$$\begin{aligned} \Vert F(z_k)+F(z_{k+1})\Vert ^2 \ge 2\Vert F(z_k)\Vert ^2+\Vert F(z_{k+1})\Vert ^2 \ . \end{aligned}$$

Proof

Let \(M=\int _{0}^1 \nabla F(z_{k+1}+t (z_{k+1}-z_k))dt\), then \(\Vert M\Vert \le \int _{0}^1 \Vert \nabla F(z_{k+1}+t (z_{k+1}-z_k))\Vert dt\le \gamma \). It follows from Proposition 7 with \(\hat{z}=z_{k+1}\) and \(z=z_k\) that

$$\begin{aligned} F(z_{k+1})-F(z_k)=M(z_{k+1}-z_k). \end{aligned}$$
(72)

Therefore, it holds that

$$\begin{aligned} \begin{aligned} \Vert F(z_k)+F(z_{k+1})\Vert ^2&= 2\Vert F(z_k)\Vert ^2+2\Vert F(z_{k+1})\Vert ^2-\Vert F(z_{k+1})-F(z_k)\Vert ^2\\&= 2\Vert F(z_k)\Vert ^2+2\Vert F(z_{k+1})\Vert ^2-\Vert M\left( {z_{k+1}-z_k}\right) \Vert ^2\\&= 2\Vert F(z_k)\Vert ^2+2\Vert F(z_{k+1})\Vert ^2-\Vert sMF(z_{k+1})\Vert ^2\\&\ge 2\Vert F(z_k)\Vert ^2+2\Vert F(z_{k+1})\Vert ^2-\Vert F(z_{k+1})\Vert ^2\\&= 2\Vert F(z_k)\Vert ^2+\Vert F(z_{k+1})\Vert ^2 \ , \end{aligned}{} \end{aligned}$$
(73)

where the second equality is from the iterate update (5) and the inequality uses \(s\le \frac{1}{\gamma }\le \Vert M\Vert \). \(\square \)

Let us go back to prove Theorem 4:

Proof of Theorem 4

Let \(M=\int _{0}^1 \nabla F(z_k+t (z_{k+1}-z_k))dt\), then it follows from Proposition 7 with \(\hat{z}=z_{k+1}\) and \(z=z_k\) that \(F(z_{k+1})-F(z_k)=M(z_{k+1}-z_k)\), thus

$$\begin{aligned} \begin{aligned} F(z_{k+1})&=\frac{1}{2}\left( F(z_k)+F(z_{k+1})\right) + \frac{1}{2}\left( {F(z_{k+1})-F(z_{k})}\right) \\&=\frac{1}{2}\left( F(z_k)+F(z_{k+1})\right) + \frac{1}{2}M\left( {z_{k+1} -z_k}\right) \\&= \frac{1}{2}\left( F(z_k)+F(z_{k+1})\right) - \frac{s}{2}M F(z_{k+1})\ , \end{aligned} \end{aligned}$$
(74)

where the last equality utilizes the iterate update (5). By rearranging (74), we obtain

$$\begin{aligned} F(z_{k+1})= \frac{1}{2}\left( {I+ \frac{s}{2} M}\right) ^{-1}\left( {F(z_k)+F(z_{k+1})}\right) \ , \end{aligned}$$

whereby

$$\begin{aligned} \begin{aligned} F(z_{k+1})-F(z_k)&= M \left( {z_{k+1}-z_k}\right) = -sM F(z_{k+1})\\&= -\frac{s}{2}M\left( {I+ \frac{s}{2} M}\right) ^{-1}\left( {F(z_k)+F(z_{k+1})}\right) \\&= -\frac{s}{2} M \left( {\sum _{i=0}^{\infty } (-1)^i \left( {\frac{s}{2}}\right) ^i M^i}\right) \left( {F(z_k)+F(z_{k+1})}\right) \ , \end{aligned}{} \end{aligned}$$
(75)

where the first equality uses (72) and the second equality is due to the update rule (5).

Going back to the proof scratch stated in Sect. 5.2, (75) shows that it holds for PPM that \(R(z_k, s)= -\frac{1}{2}M\left( {I+ \frac{s}{2} M}\right) ^{-1}\) and \(R_i(z_k)= (-1)^{i+1}(\frac{1}{2})^{i+1} M^{i+1}\). The rest of the proof is to show that the O(s)-linear-convergence condition (53) guarantees the sufficient decay for the corresponding \(R_0\) and \(R_1\) terms, and the smaller order terms do not affect the rate when the step-size is small enough.

Notice it holds that

$$\begin{aligned} \begin{aligned}&\frac{1}{2}\Vert F(z_{k+1})\Vert ^2-\frac{1}{2}\Vert F(z_{k})\Vert ^2 \\&= \frac{1}{2}\left( {F(z_k)+F(z_{k+1})}\right) ^T \left( {F(z_{k+1})-F(z_k)}\right) \\&= -\frac{s}{4} \left( {F(z_k)+F(z_{k+1})}\right) ^T M \sum _{i=0}^{\infty } (-1)^i \left( {\frac{s}{2}}\right) ^i M^i \left( {F(z_k)+F(z_{k+1})}\right) \\&= \frac{1}{2}\sum _{i=1}^{\infty } (-1)^i \left( {\frac{s}{2}}\right) ^{i} \left( {F(z_k)+F(z_{k+1})}\right) ^T M^i \left( {F(z_k)+F(z_{k+1})}\right) \ , \end{aligned}{} \end{aligned}$$
(76)

where the second equality follows from (75).

Since L(xy) is convex-concave, M is generalized block skew-symmetric. Let us denote \(M=\left[ {\begin{matrix} A &{} B \\ -B &{} C \end{matrix}}\right] \) and then \(M^2=\left[ {\begin{matrix} A^2- BB^T &{} AB+BC \\ -B^T A-C B &{} -B^T B + C^2 \end{matrix}}\right] \). It follows Proposition 5 that for any power i, \(M^i\) is also generalized block skew-symmetric, thus the off-diagonal terms cancel out when computing \(\left( {F(z_k)+F(z_{k+1})}\right) ^T M^i \left( {F(z_k)+F(z_{k+1})}\right) \). Therefore, it holds that

$$\begin{aligned} \begin{array}{cl} &{} \sum \limits _{i=1}^{2} (-1)^i \left( \frac{s}{2}\right) ^{i-1} \left( {F(z_k)+F(z_{k+1})}\right) ^T M^i \left( {F(z_k)+F(z_{k+1})}\right) \\ \\ &{}\qquad = -\left( {F(z_k)+F(z_{k+1})}\right) ^T \left[ \begin{matrix} A- \frac{s}{2} A^2+ \frac{s}{2} BB^T &{} 0 \\ 0 &{} C- \frac{s}{2} C^2+ \frac{s}{2} B^T B \end{matrix}\right] \left( {F(z_k)+F(z_{k+1})}\right) . \end{array}\nonumber \\ \end{aligned}$$
(77)

Meanwhile, it follows from Proposition 6 with \(Q=M\) and \(c=s\) that for any \(i\ge 3\),

$$\begin{aligned} \begin{aligned}&s^{i-1} |\left( {F(z_k)+F(z_{k+1})}\right) ^T M^i \left( {F(z_k)+F(z_{k+1})}\right) | \\&\qquad \le \, (i-1) (s\gamma )^{i-2} \left( {F(z_k)+F(z_{k+1})}\right) ^T \left[ \begin{matrix} s\gamma A+ s BB^T &{} 0\\ 0 &{} s\gamma C+ s B^T B \end{matrix}\right] \left( {F(z_k)+F(z_{k+1})}\right) \\&\qquad \le \, (i-1) (s\gamma )^{i-2} \left( {F(z_k)+F(z_{k+1})}\right) ^T \left[ \begin{matrix} A+ s BB^T &{} 0\\ 0 &{} C+ s B^T B \end{matrix}\right] \left( {F(z_k)+F(z_{k+1})}\right) , \end{aligned}\nonumber \\ \end{aligned}$$
(78)

where the last inequality uses \(s\gamma \le 1\). Also notice that \(s \gamma \le \frac{1}{3}\), thus \(\sum _{i=3}^{\infty } \left( {\tfrac{1}{2}}\right) ^{i-1}(i-1)(s\gamma )^{i-2} = \frac{1}{2}\left( { \frac{s\gamma }{2}+ \frac{\frac{s\gamma }{2}}{1-\frac{s\gamma }{2}}}\right) \le \frac{1}{4}\). Therefore, it holds that

$$\begin{aligned} \begin{aligned}&\sum _{i=3}^{\infty } (-1)^i \left( \frac{s}{2}\right) ^{i-1} \left( {F(z_k)+F(z_{k+1})}\right) ^T M^i \left( {F(z_k)+F(z_{k+1})}\right) \\&\quad \le \, \sum _{i=3}^{\infty } \left( {\tfrac{1}{2}}\right) ^{i-1} s^{i-1} |\left( {F(z_k)+F(z_{k+1})}\right) ^T M^i \left( {F(z_k)+F(z_{k+1})}\right) |\\&\quad \le \, \sum _{i=3}^{\infty } \left( {\tfrac{1}{2}}\right) ^{i-1}(i-1)(s\gamma )^{i-2} \left( {F(z_k)+F(z_{k+1})}\right) ^T \left[ \begin{matrix} A+ s BB^T &{} 0\\ 0 &{} C+ s B^T B \end{matrix}\right] \left( {F(z_k)+F(z_{k+1})}\right) \\&\quad \le \, \frac{1}{4} \left( {F(z_k)+F(z_{k+1})}\right) ^T \left[ \begin{matrix} A+ s BB^T &{} 0\\ 0 &{} C+ s B^T B \end{matrix}\right] \left( {F(z_k)+F(z_{k+1})}\right) \\&\quad \le \, \frac{1}{2} \left( {F(z_k)+F(z_{k+1})}\right) ^T \left[ \begin{matrix} A- \frac{s}{2} A^2+ \frac{s}{2} BB^T &{} 0 \\ 0 &{} C- \frac{s}{2} C^2+ \frac{s}{2} B^T B \end{matrix}\right] \left( {F(z_k)+F(z_{k+1})}\right) \ , \end{aligned}{} \end{aligned}$$
(79)

where the last inequality follows from \(sA^2\preceq s\gamma A \preceq A\) by noticing A is positive semi-definite, \(\Vert A\Vert \le \Vert M\Vert \le \gamma \) and \(s\gamma \le 1\). Substituting (77) and (78) into (76) yields

$$\begin{aligned} \begin{array}{cl} &{} \frac{1}{2}\Vert F(z_{k+1})\Vert ^2-\frac{1}{2}\Vert F(z_{k})\Vert ^2 \\ \\ &{}\quad \le \, -\frac{s}{8}\left( {F(z_k)+F(z_{k+1})}\right) ^T\left[ \begin{matrix} A- \frac{s}{2} A^2+ \frac{s}{2} BB^T &{} 0 \\ 0 &{} C- \frac{s}{2} C^2+ \frac{s}{2} B^T B \end{matrix}\right] \left( {F(z_k)+F(z_{k+1})}\right) \\ \\ &{}\quad \le \, -\frac{s\rho (s)}{8} \Vert F(z_k)+F(z_{k+1})\Vert ^2 \\ \\ &{}\quad \le \,-\frac{s\rho (s)}{4} \Vert F(z_{k})\Vert ^2 -\frac{s\rho (s)}{8} \Vert F(z_{k+1})\Vert ^2 \end{array} \end{aligned}$$
(80)

where the inequality is due to Proposition 8. By rearranging (80), we have

$$\begin{aligned} \Vert F(z_{k+1})\Vert ^2 \le \frac{1-\frac{s\rho (s)}{2}}{1+\frac{s\rho (s)}{4}} \Vert F(z_{k})\Vert ^2\ , \end{aligned}$$

which furnishes the proof of Theorem 4. \(\square \)

1.2 Proof of Theorem 5

The following two propositions will be needed for the proof of Theorem 5.

Proposition 9

Consider EGM with step-size s. Let \(M=\int _{0}^1 \nabla F(z_{k}+t (z_{k+1}-z_k))dt\), \(M_1=\int _{0}^1 \nabla F(\tilde{z}_k+t (z_{k+1}-\tilde{z}_k))dt\), and \(M_2=\int _{0}^1 \nabla F(z_k+t (\tilde{z}_{k}-z_k))dt\). Then it holds for any k that

$$\begin{aligned} F(\tilde{z}_{k})= \frac{1}{2}\left( {I+ \frac{s}{2} M + \frac{s^3}{2}M_1M_2M}\right) ^{-1}\left( {I-\frac{s^2}{2}M_1M_2}\right) \left( {F(z_k)+F(z_{k+1})}\right) . \end{aligned}$$
(81)

Proof

By the definition of M, \(M_1\) and \(M_2\), we have \(\left\| M\right\| ,\left\| M_1\right\| , \left\| M_2\right\| \le \gamma \). Moreover, it follows from Proposition 7 that

$$\begin{aligned} F(z_{k+1})-F(z_k)&=M(z_{k+1}-z_k), \end{aligned}$$
(82)
$$\begin{aligned} F(z_{k+1})-F(\tilde{z}_k)&=M_1(z_{k+1}-\tilde{z}_k), \end{aligned}$$
(83)
$$\begin{aligned} F(\tilde{z}_{k})-F(z_k)&=M_2(\tilde{z}_{k}-z_k) \ , \end{aligned}$$
(84)

Together with the iterate update of EGM algorithm (6), we obtain

$$\begin{aligned} F(z_{k+1})-F(z_k)= M \left( {z_{k+1}-z_k}\right) = -s MF(\tilde{z}_{k}). \end{aligned}$$
(85)

and

$$\begin{aligned} \begin{aligned} F(\tilde{z}_{k})-F(z_{k+1})&= M_1 (\tilde{z}_k-z_{k+1})= sM_1(F(\tilde{z}_{k})-F(z_{k}))\\&= sM_1M_2(\tilde{z}_k-z_k) =-s^2 M_1M_2F(z_{k})\\&= -s^2 M_1M_2\left[ \tfrac{1}{2}\left( {F(z_k)+F(z_{k+1})}\right) -\tfrac{1}{2}\left( {F(z_{k+1})-F(z_k)}\right) \right] \\&= -s^2 M_1M_2\left[ \tfrac{1}{2}\left( {F(z_k)+F(z_{k+1})}\right) +\tfrac{1}{2}s M F(\tilde{z}_{k})\right] , \end{aligned}{} \end{aligned}$$
(86)

where the second equality is from the update rule (6) and the last equality uses (85). Using (85) and (86), we can rewrite \(F(\tilde{z}_{k})\) as:

$$\begin{aligned} \begin{aligned} F(\tilde{z}_{k})&= \frac{1}{2}\left( {F(z_k)+F(z_{k+1})}\right) + \frac{1}{2}\left( {F(z_{k+1})-F(z_k)}\right) + \left( {F(\tilde{z}_{k})-F(z_{k+1})}\right) \\&= \frac{1}{2}\left( {F(z_k)+F(z_{k+1})}\right) -\frac{s}{2}M F(\tilde{z}_{k})- \frac{s^2}{2} M_1M_2 \left( {F(z_k)+F(z_{k+1})}\right) \\&\quad - \frac{s^3}{2} M_1M_2M F(\tilde{z}_{k})\ . \end{aligned} \end{aligned}$$
(87)

We finish the proof by rearranging (87). \(\square \)

Remark 10

Going back to the proof scratch stated in Sect. 5.2, Proposition 9 shows that it holds for EGM that

$$\begin{aligned} F(z_{k+1})-F(z_k)&= M\left( {z_{k+1}-z_k}\right) = -sM F(z_{k+1})\\&=-s\frac{1}{2}M\left( {I+ \frac{s}{2} M + \frac{s^3}{2}M_1M_2M}\right) ^{-1}\left( {I-\frac{s^2}{2}M_1M_2}\right) \left( {F(z_k)+F(z_{k+1})}\right) \ , \end{aligned}$$

whereby \(R(z_k, s)=-\frac{1}{2}M\left( {I+ \frac{s}{2} M + \frac{s^3}{2}M_1M_2M}\right) ^{-1}\left( {I-\frac{s^2}{2}M_1M_2}\right) \). The rest of the proofs of Theorems 5 and 6 are to show that the O(s)-linear-convergence condition (53) corresponds to the sufficient decay for the \(R_0\) and \(R_1\) terms, and the smaller order terms do not affect the rate when the step-size is small enough. Moreover, the difference between the slow rate (Theorem 5) and the fast rate (Theorem 6) comes from how small the step-sizes need be in order to bound the smaller order terms.

Proposition 10

Consider EGM with step-size s. Suppose \(s\le \frac{1}{2\gamma }\), then it holds for any k that

$$\begin{aligned} \Vert F(z_k)+F(z_{k+1})\Vert ^2 \ge \frac{8}{5}\Vert F(z_k)\Vert ^2+\frac{8}{5}\Vert F(z_{k+1})\Vert ^2 \ . \end{aligned}$$

Proof

It follows from (82) and (6) that

$$\begin{aligned} \begin{aligned} \Vert F(z_k)+F(z_{k+1})\Vert ^2&= 2\Vert F(z_k)\Vert ^2+2\Vert F(z_{k+1})\Vert ^2-\Vert F(z_{k+1})-F(z_k)\Vert ^2\\&= 2\Vert F(z_k)\Vert ^2+2\Vert F(z_{k+1})\Vert ^2-\Vert M\left( {z_{k+1}-z_k}\right) \Vert ^2\\&= 2\Vert F(z_k)\Vert ^2+2\Vert F(z_{k+1})\Vert ^2-\Vert sMF(\tilde{z}_{k})\Vert ^2 \ . \end{aligned}{} \end{aligned}$$
(88)

From Proposition 9, we obtain that

$$\begin{aligned} \begin{aligned} \Vert sMF(\tilde{z}_{k})\Vert ^2&\le \tfrac{s^2}{4} \Vert M\Vert ^2\Vert I+ \tfrac{s}{2} M + \tfrac{s^3}{2}M_1M_2M\Vert ^{-2}\Vert I-\tfrac{s^2}{2}M_1M_2\Vert ^2\Vert F(z_k)+F(z_{k+1})\Vert ^2\ \\&\le \tfrac{\left( {s\gamma }\right) ^2}{4} (1-\tfrac{s\gamma }{2}-\tfrac{\left( {s\gamma }\right) ^3}{2})^{-2} (1+\tfrac{(s\gamma )^2}{2})^2\Vert F(z_k)+F(z_{k+1})\Vert ^2 \\&\le \frac{1}{4}\Vert F(z_k)+F(z_{k+1})\Vert ^2 \ , \end{aligned} \end{aligned}$$
(89)

where the second inequality comes from the facts:

$$\begin{aligned} \Vert I+ \tfrac{s}{2} M + \tfrac{s^3}{2}M_1M_2M\Vert \ge \Vert I\Vert -\Vert \tfrac{s}{2} M\Vert -\Vert \tfrac{s^3}{2}M_1M_2M\Vert \ge 1-\tfrac{s\gamma }{2}-\tfrac{\left( {s\gamma }\right) ^3}{2}\ , \end{aligned}$$

and

$$\begin{aligned} \Vert I-\tfrac{s^2}{2}M_1M_2\Vert \le \Vert I\Vert +\Vert \tfrac{s^2}{2}M_1M_2\Vert \le 1+\tfrac{(s\gamma )^2}{2} \ , \end{aligned}$$

and the last inequality uses the fact that \(s\gamma \le \frac{1}{2}\). Combining (88) and (89), we arrive at

$$\begin{aligned} \Vert F(z_k)+F(z_{k+1})\Vert ^2&= 2\Vert F(z_k)\Vert ^2+2\Vert F(z_{k+1})\Vert ^2-\Vert sMF(\tilde{z}_{k})\Vert ^2 \ge 2\Vert F(z_k)\Vert ^2\\&\quad +2\Vert F(z_{k+1})\Vert ^2- \frac{1}{4}\Vert F(z_k)+F(z_{k+1})\Vert ^2 \ , \end{aligned}$$

which finishes the proof by rearrangement. \(\square \)

Let us go back to the proof of Theorem 5:

Proof of Theorem 5

It follows from (82) that

$$\begin{aligned} \begin{aligned}&\frac{1}{2}\Vert F(z_{k+1})\Vert ^2-\frac{1}{2}\Vert F(z_{k})\Vert ^2 \\&= \frac{1}{2}\left( {F(z_k)+F(z_{k+1})}\right) ^T \left( {F(z_{k+1})-F(z_k)}\right) \\&= \frac{1}{2}\left( {F(z_k)+F(z_{k+1})}\right) ^T M \left( {z_{k+1}-z_k}\right) \\&= -\frac{s}{2} \left( {F(z_k)+F(z_{k+1})}\right) ^T M F(\tilde{z}_{k})\\&= -\frac{s}{4} \left( {F(z_k)+F(z_{k+1})}\right) ^T M \left( {I+ \frac{s}{2} M + \frac{s^3}{2}M_1M_2M}\right) ^{-1}\\&\left( {I-\frac{s^2}{2}M_1M_2}\right) \left( {F(z_k)+F(z_{k+1})}\right) \\&= -\frac{s}{4} \left( {F(z_k)+F(z_{k+1})}\right) ^T \left( { M - \frac{s}{2} M^2}\right) \left( {F(z_k)+F(z_{k+1})}\right) \\&\quad -\frac{s}{4} \left( {F(z_k)+F(z_{k+1})}\right) ^T \left( {- \frac{s^3}{2}MM_1M_2M}\right) \left( {F(z_k)+F(z_{k+1})}\right) \\&\quad -\frac{s}{4} \left( {F(z_k)+F(z_{k+1})}\right) ^T M \sum _{i=2}^{\infty } (-1)^i \left( {\frac{s}{2} M + \frac{s^3}{2}M_1M_2M}\right) ^i \left( {F(z_k)+F(z_{k+1})}\right) \\&\quad -\frac{s}{4} \left( {F(z_k)+F(z_{k+1})}\right) ^T M \left( {I+ \frac{s}{2} M + \frac{s^3}{2}M_1M_2M}\right) ^{-1}\frac{s^2}{2}M_1M_2\left( {F(z_k)+F(z_{k+1})}\right) \ ,\\ \end{aligned}{} \end{aligned}$$
(90)

where the third equality is from the update of EGM algorithm, the fourth equality follows from Proposition 9, and the last equality is rearrangement by noticing \(\left( {I+ \frac{s}{2} M + \frac{s^3}{2}M_1M_2M}\right) ^{-1}=\sum _{i=0}^{\infty } (-1)^i \left( {\frac{s}{2} M + \frac{s^3}{2}M_1M_2M}\right) ^i\).

Now let us examine each term at the right-hand-side of (90). In principal, the last three terms are at most \(O(s^3)\), and the first term is at least \(O(s^2)\), which dominants the right-hand-side of (90) when s is small. Suppose \(M=\left[ \begin{matrix} A &{} B\\ -B^T &{} C \end{matrix}\right] \), then \(M^2=\left[ \begin{matrix} A^2-BB^T &{} AB+BC\\ -B^TA-CB^T &{} C^2-B^T B \end{matrix}\right] \). Notice that \(\Vert M_1\Vert , \Vert M_2\Vert , \Vert M\Vert \le \gamma \le 1/2s\). For the first term at the right-hand-side of (90), it holds that

$$\begin{aligned} \begin{array}{cl} &{}\displaystyle -\frac{s}{4}\left( {F(z_k)+F(z_{k+1})}\right) ^T \left( { M - \frac{s}{2} M^2 }\right) \left( {F(z_k)+F(z_{k+1})}\right) \\ &{}\quad \displaystyle = \, -\frac{s}{4}\left( {F(z_k)+F(z_{k+1})}\right) ^T \left[ \begin{matrix} A- \frac{s}{2} A^2+ \frac{s}{2} BB^T &{} 0 \\ 0 &{} C- \frac{s}{2} C^2+ \frac{s}{2} B^T B \end{matrix}\right] \left( {F(z_k)+F(z_{k+1})}\right) \\ \\ &{}\quad \displaystyle \le \, -\frac{s\rho (s)}{8} \Vert F(z_k)+F(z_{k+1})\Vert ^2 , \end{array}\nonumber \\ \end{aligned}$$
(91)

where the inequality uses the condition (42). For the second term at the right-hand-side of (90), it holds that

$$\begin{aligned}&\left| \frac{s}{4}\left( {F(z_k)+F(z_{k+1})}\right) ^T \frac{s^3}{2}MM_1M_2M\left( {F(z_k)+F(z_{k+1})}\right) \right| \nonumber \\&\quad \le \,\frac{s^4}{8} \gamma ^4 \Vert F(z_k)+F(z_{k+1})\Vert ^2 \le \frac{s^3}{16} \gamma ^3 \Vert F(z_k)+F(z_{k+1})\Vert ^2\ , \end{aligned}$$
(92)

where the last inequality uses \(s\gamma \le \frac{1}{2}\). For the third term at the right-hand-side of (90), it holds that

$$\begin{aligned} \begin{aligned}&\ \left| \frac{s}{4}\left( {F(z_k)+F(z_{k+1})}\right) ^T M \sum _{i=2}^{\infty } (-1)^i \left( {\frac{s}{2} M + \frac{s^3}{2}M_1M_2M}\right) ^i \left( {F(z_k)+F(z_{k+1})}\right) \right| \\&\quad \le \ \frac{s}{4} \sum _{i=2}^{\infty } \left( {\frac{s}{2} \gamma + \frac{s^3}{2}\gamma ^3}\right) ^i \gamma \Vert F(z_k)+F(z_{k+1})\Vert ^2 \\&\quad \le \ \frac{s}{4}\sum _{i=2}^{\infty } (\tfrac{5}{8} s\gamma )^i\gamma \Vert F(z_k)+F(z_{k+1})\Vert ^2 \\&\quad = \ \frac{25}{256} s^3\gamma ^3 \frac{1}{1-\frac{5}{8} s\gamma } \Vert F(z_k)+F(z_{k+1})\Vert ^2 \\&\quad \le \, \frac{5}{32} s^3\gamma ^3 \Vert F(z_k)+F(z_{k+1})\Vert ^2 \ , \end{aligned} \end{aligned}$$
(93)

where the first inequality is because

$$\begin{aligned} \left\| \sum _{i=2}^{\infty } (-1)^i \left( {\frac{s}{2} M + \frac{s^3}{2}M_1M_2M}\right) ^i\right\|\le & {} \sum _{i=2}^{\infty } \left( {\frac{s}{2} \left\| M\right\| + \frac{s^3}{2}\left\| M_1M_2M\right\| }\right) ^i\\= & {} \sum _{i=2}^{\infty } \left( {\frac{s}{2} \gamma + \frac{s^3}{2}\gamma ^3}\right) ^i\ , \end{aligned}$$

and the second and last inequality uses the fact that \(s\gamma \le \tfrac{1}{2}\). Similarly, for the last term at the right-hand-side of (90), it holds that

$$\begin{aligned} \begin{aligned}&\ \left| \frac{s^3}{8} \left( {F(z_k)+F(z_{k+1})}\right) ^T M \left( {I+ \frac{s}{2} M + \frac{s^3}{2}M_1M_2M}\right) ^{-1}M_1M_2\left( {F(z_k)+F(z_{k+1})}\right) \right| \\&\quad \le \, \frac{s^3\gamma ^3}{8} \frac{1}{1-\frac{s\gamma }{2}-\frac{s^3\gamma ^3}{2}}\Vert F(z_k)+F(z_{k+1})\Vert ^2 \\&\quad \le \, \ \frac{1}{5}s^3\gamma ^3\Vert F(z_k)+F(z_{k+1})\Vert ^2 \ . \end{aligned}{} \end{aligned}$$
(94)

Substituting (91), (92), (94) and (93) into (90), we arrive at:

$$\begin{aligned} \begin{aligned}&\frac{1}{2}\Vert F(z_{k+1})\Vert ^2-\frac{1}{2}\Vert F(z_{k})\Vert ^2\\&\quad \le \, \left( {-\frac{s\rho (s)}{8} +\left( {\frac{1}{16}+\frac{5}{32}+\frac{1}{5}}\right) s^3\gamma ^3}\right) \Vert F(z_k)+F(z_{k+1})\Vert ^2 \\&\quad \le \, \left( {-\frac{s\rho (s)}{8} +\frac{1}{2} s^3\gamma ^3}\right) \Vert F(z_k)+F(z_{k+1})\Vert ^2 \\&\quad \le \, -\frac{s\rho (s)}{16} \Vert F(z_k)+F(z_{k+1})\Vert ^2 \\&\quad \le \, -\frac{s\rho (s)}{10} \Vert F(z_{k+1})\Vert ^2 -\frac{s\rho (s)}{10}\Vert F(z_{k})\Vert ^2\ , \end{aligned} \end{aligned}$$
(95)

where the third inequality uses \(\rho (s)\ge 8s^2\gamma ^3\), and the last inequality is from Proposition 10. Rearranging (95) yields

$$\begin{aligned} \Vert F(z_{k+1})\Vert ^2 \le \left( {\frac{1-\frac{s\rho (s)}{5}}{1+\frac{s\rho (s)}{5}}}\right) \Vert F(z_{k})\Vert ^2\ , \end{aligned}$$

which finishes the proof by telescoping. \(\square \)

1.3 Proof of Theorem 6

The next proposition will be used in the proof of Theorem 6:

Proposition 11

Consider \(Q\in \mathbb {R}^{(m+n)\times (m+n)}\) with \(\Vert Q\Vert \le \alpha < 1\). Suppose there exist a positive semi-definite matrix P satisfies that for any \(c\in \mathbb {R}^{m+n}\) and any positive integer \(k\ge 3\), it holds that

$$\begin{aligned} |c^T Q^k c|\le (k-1)\alpha ^{k-2}s^2 c^T P c \end{aligned}$$
(96)

with a positive scalar s, then we have for any \(j\ge 3\) that

$$\begin{aligned} \left| c^T Q^j(I+\frac{Q}{2}+\frac{Q^3}{2})^{-1}c\right| \le s^2 h_2(2\alpha )(2\alpha )^{j-2} {c^TPc} \ , \end{aligned}$$
(97)

where \(h_2(u)=\left( {1-\frac{u}{2}-\frac{u^3}{2}}\right) ^{-1}\).

Proof

Consider function \(h_1(u):=(1+\frac{u}{2}+\frac{u^3}{2})^{-1}\) and \(h_2(u):=(1-\frac{u}{2}-\frac{u^3}{2})^{-1}\). The power series expansion of \(h_1(u)\) and \(h_2(u)\) are

$$\begin{aligned} h_1(u)=\left( {1+\frac{u}{2}+\frac{u^3}{2}}\right) ^{-1}= \sum _{l=0}^{\infty }(-1)^{l} \left( {\frac{u}{2}+\frac{u^3}{2}}\right) ^l=\sum _{i=0}^{\infty }a_i u^i \ , \end{aligned}$$
(98)

and

$$\begin{aligned} {} h_2(u)=\left( {1-\frac{u}{2}-\frac{u^3}{2}}\right) ^{-1}= \sum _{l=0}^{\infty } \left( {\frac{u}{2}+\frac{u^3}{2}}\right) ^l=\sum _{i=0}^{\infty }b_i u^i \ , \end{aligned}$$
(99)

where \(a_i\) and \(b_i\) are the i-th coefficients of the power series expansion of \(h_1(u)\) and \(h_2(u)\), respectively. Notice that the above two infinite sum converges in the domain \(\{u:|\frac{u}{2}+\frac{u^3}{2}|<1\}\). Furthermore, it is straight-forward to see that for any i, \(|a_i|\le b_i\) because of the existence of the \((-1)^l\) term in the expansion of \(h_1(u)\).

Notice that \(\Vert Q\Vert \le \alpha < 1\), thus \(\Vert \frac{Q}{2}+\frac{Q^3}{2}\Vert <1\), whereby the power series expansion of the matrix function f(Q) converge. Therefore, it holds that

$$\begin{aligned} \begin{aligned} \left| c^T Q^j\left( {I+\frac{Q}{2}+\frac{Q^3}{2}}\right) ^{-1}c\right|&=\left| c^T \sum _{i=0}^{\infty }a_i Q^{i+j}c\right| \le \sum _{i=0}^{\infty }\left| a_i\right| \left| c^TQ^{i+j}c\right| \\&\le \, \sum _{i=0}^{\infty }\left| a_i\right| (i+j-1)\alpha ^{i+j-2}s^2{c^TPc} \ , \\ \end{aligned} \end{aligned}$$
(100)

where the last inequality is from (96). Furthermore, notice that \(j\ge 3\), thus it holds for any \(i\ge 0\) that \((i+j-1)\alpha ^{i+j-2}\le (2\alpha )^{i+j-2}\). Therefore,

$$\begin{aligned} \begin{aligned} \sum _{i=0}^{\infty }\left| a_i\right| (i+j-1)\alpha ^{i+j-2}{c^TPc}&\le \sum _{i=0}^{\infty }\left| a_i\right| (2\alpha )^{i+j-2}{c^TPc}\\&\le \sum _{i=0}^{\infty }b_i(2\alpha )^{i+j-2}{c^TPc} = h_2(2\alpha )(2\alpha )^{j-2} {c^TPc} \ , \end{aligned} \end{aligned}$$
(101)

where the second inequality uses \(|a_i|\le b_i\), the first equality is from (99). Combining (100) and (101) finishes the proof of Proposition 11. \(\square \)

Now let us go back to EGM. By choosing \(Q=sM\), \(\alpha =s\gamma \), and \(P=\left[ \begin{matrix} \gamma A+ BB^T &{} 0\\ 0 &{} \gamma C+ B^T B \end{matrix}\right] \) in Proposition 11, we obtain:

Corollary 3

$$\begin{aligned}&\left| s^jc^TM^j\left( {I+ \frac{s}{2} M + \frac{s^3}{2}M^3}\right) ^{-1}c\right| \nonumber \\&\quad \le s^2 (1-s\gamma -4s^3\gamma ^3)^{-1}(2s\gamma )^{j-2}{c^T \left[ \begin{matrix} \gamma A+ BB^T &{} 0\\ 0 &{} \gamma C+ B^T B \end{matrix}\right] c} \ . \end{aligned}$$
(102)

Proof

Notice that \(\Vert sM\Vert \le s\gamma < 1\). Furthermore, it follows by Proposition 6 that for any c and \(k\ge 3\),

$$\begin{aligned} |c^T s^k M^k c|= s^k |c^T M^k c|\le (k-1)s^2(s\gamma )^{k-2}c^T\left[ \begin{matrix} \gamma A+ BB^T &{} 0\\ 0 &{} \gamma C+ B^T B \end{matrix}\right] c \ . \end{aligned}$$

Thus \(Q=sM\), \(\alpha =s\gamma \), and \(P=\left[ \begin{matrix} \gamma A+ BB^T &{} 0\\ 0 &{} \gamma C+ B^T B \end{matrix}\right] \) satisfies the conditions in Proposition 11, which leads to (102) by noticing \(h_2(2s\gamma )=(1-s\gamma -4s^3\gamma ^3)^{-1}\).

\(\square \)

Proof of Theorem 6

Following the notations in the proof of Theorem 5, it holds that \(M_1=M_2=M=\left[ \begin{matrix} A &{} B\\ -B^T &{} C \end{matrix}\right] \) when the minimax function L(xy) is quadratic, and we can then write (90) as

$$\begin{aligned} \begin{aligned}&\frac{1}{2}\Vert F(z_{k+1})\Vert ^2-\frac{1}{2}\Vert F(z_{k})\Vert ^2 \\&= -\frac{s}{4} \left( {F(z_k)+F(z_{k+1})}\right) ^T \left( { M - \frac{s}{2} M^2}\right) \left( {F(z_k)+F(z_{k+1})}\right) \\&\quad +\frac{s^4}{8} \left( {F(z_k)+F(z_{k+1})}\right) ^T M^4 \left( {F(z_k)+F(z_{k+1})}\right) \\&\quad -\frac{s}{4} \left( {F(z_k)+F(z_{k+1})}\right) ^T M \sum _{i=2}^{\infty } (-1)^i \left( {\frac{s}{2} M + \frac{s^3}{2}M^3}\right) ^i \left( {F(z_k)+F(z_{k+1})}\right) \\&\quad -\frac{s^3}{8} \left( {F(z_k)+F(z_{k+1})}\right) ^T M^3 \left( {I+ \frac{s}{2} M + \frac{s^3}{2}M^3}\right) ^{-1}\left( {F(z_k)+F(z_{k+1})}\right) \ ,\\ \end{aligned}{} \end{aligned}$$
(103)

by utilizing the fact that \(f(M) M=M f(M)\) if f is a function of M with convergent power series. Let us again examine each term at the right-hand side of (103). For the first term, recall that (91) shows that

$$\begin{aligned} \begin{aligned}&-\frac{s}{4}\left( {F(z_k)+F(z_{k+1})}\right) ^T \left( { M - \frac{s}{2} M^2 }\right) \left( {F(z_k)+F(z_{k+1})}\right) \\&\quad = -\frac{s}{4}\left( {F(z_k)+F(z_{k+1})}\right) ^T \left[ \begin{matrix} A- \frac{s}{2} A^2+ \frac{s}{2} BB^T &{} 0 \\ 0 &{} C- \frac{s}{2} C^2+ \frac{s}{2} B^T B \end{matrix}\right] \left( {F(z_k)+F(z_{k+1})}\right) \ . \end{aligned}{} \end{aligned}$$
(104)

For the second term, it follows from Proposition 6 that

$$\begin{aligned} \begin{aligned}&\frac{s^4}{8} \left| \left( {F(z_k)+F(z_{k+1})}\right) ^T M^4 \left( {F(z_k)+F(z_{k+1})}\right) \right| \\&\quad \le \frac{3s^4}{8}\gamma ^2 {\left( {F(z_k)+F(z_{k+1})}\right) ^T \left[ \begin{matrix} \gamma A+ BB^T &{} 0\\ 0 &{} \gamma C+ B^T B \end{matrix}\right] \left( {F(z_k)+F(z_{k+1})}\right) } \\&\quad \le \frac{3s}{8}(s\gamma )^2 {\left( {F(z_k)+F(z_{k+1})}\right) ^T \left[ \begin{matrix} A+ s BB^T &{} 0\\ 0 &{} C+ s B^T B \end{matrix}\right] \left( {F(z_k)+F(z_{k+1})}\right) }\\&\quad \le \frac{3s}{4}(s\gamma )^2 {\left( {F(z_k)+F(z_{k+1})}\right) ^T \left[ \begin{matrix} A- \frac{s}{2} A^2+ \frac{s}{2} BB^T &{} 0 \\ 0 &{} C- \frac{s}{2} C^2+ \frac{s}{2} B^T B \end{matrix}\right] \left( {F(z_k)+F(z_{k+1})}\right) } \ . \end{aligned} \end{aligned}$$
(105)

For the third term, it holds that

$$\begin{aligned} \begin{aligned}&~\left| \frac{s}{4} \left( {F(z_k)+F(z_{k+1})}\right) ^T M \sum _{i=2}^{\infty } (-1)^i \left( {\frac{s}{2} M + \frac{s^3}{2}M^3}\right) ^i \left( {F(z_k)+F(z_{k+1})}\right) \right| \\&\quad = ~\left| \frac{s}{4} \left( {F(z_k)+F(z_{k+1})}\right) ^T M \left( {\frac{s}{2} M + \frac{s^3}{2}M^3}\right) ^2 \sum _{i=0}^{\infty } (-1)^i \left( {\frac{s}{2} M + \frac{s^3}{2}M^3}\right) ^i \left( {F(z_k)+F(z_{k+1})}\right) \right| \\&\quad = ~\left| \frac{s}{4} \left( {F(z_k)+F(z_{k+1})}\right) ^T M \left( {\frac{s}{2} M + \frac{s^3}{2}M^3}\right) ^2 \left( {I+ \frac{s}{2} M + \frac{s^3}{2}M^3}\right) ^{-1} \left( {F(z_k)+F(z_{k+1})}\right) \right| \\&\quad =~ \left| \frac{s}{4} \left( {F(z_k)+F(z_{k+1})}\right) ^T M \left( {\frac{s^2}{4}M^2+\frac{s^4}{2}M^4+\frac{s^6}{4}M^6}\right) \left( {I+ \frac{s}{2} M + \frac{s^3}{2}M^3}\right) ^{-1} \left( {F(z_k)+F(z_{k+1})}\right) \right| \\&\quad \le ~ \tfrac{s^2}{4} \left( {\tfrac{(2s\gamma )}{4}+\tfrac{(2s\gamma )^3}{2}+\tfrac{(2s\gamma )^5}{4}}\right) \left( {1 -s\gamma - 4s^3\gamma ^3}\right) ^{-1} \\&\qquad ~\times \left( {F(z_k)+F(z_{k+1})}\right) ^T \left[ \begin{matrix} \gamma A+ BB^T &{} 0\\ 0 &{} \gamma C+ B^T B \end{matrix}\right] \left( {F(z_k)+F(z_{k+1})}\right) \\&\quad \le ~ \tfrac{s}{4} \left( {\tfrac{(2s\gamma )}{4}+\tfrac{(2s\gamma )^3}{2}+\tfrac{(2s\gamma )^5}{4}}\right) \left( {1 -s\gamma - 4s^3\gamma ^3}\right) ^{-1} \\&\qquad \times \left( {F(z_k)+F(z_{k+1})}\right) ^T \left[ \begin{matrix} A+ s BB^T &{} 0\\ 0 &{} C+ s B^T B \end{matrix}\right] \left( {F(z_k)+F(z_{k+1})}\right) \\&\quad \le ~ \tfrac{s}{2} \left( {\tfrac{(2s\gamma )}{4}+\tfrac{(2s\gamma )^3}{2}+\tfrac{(2s\gamma )^5}{4}}\right) \left( {1 -s\gamma - 4s^3\gamma ^3}\right) ^{-1} \\&\qquad ~\times \left( {F(z_k)+F(z_{k+1})}\right) ^T \left[ \begin{matrix} A- \frac{s}{2} A^2+ \frac{s}{2} BB^T &{} 0 \\ 0 &{} C- \frac{s}{2} C^2+ \frac{s}{2} B^T B \end{matrix}\right] \left( {F(z_k)+F(z_{k+1})}\right) , \end{aligned} \end{aligned}$$
(106)

where the second equality is because \(\left( {I+ \frac{s}{2} M + \frac{s^3}{2}M^3}\right) ^{-1}=\sum _{i=0}^{\infty } (-1)^i \left( {\frac{s}{2} M + \frac{s^3}{2}M^3}\right) ^i\), the first inequality utilizes Corollary  3, the second inequality uses \(s\gamma \le 1\).

For the fourth term, it follows Corollary  3 that

$$\begin{aligned} \begin{aligned}&\left| \frac{s^3}{8} \left( {F(z_k)+F(z_{k+1})}\right) ^T M^3 \left( {I+ \frac{s}{2} M + \frac{s^3}{2}M^3}\right) ^{-1}\left( {F(z_k)+F(z_{k+1})}\right) \right| \\&\quad \le ~ \frac{s^2}{8} (2s\gamma ) \left( {1 -s\gamma - 4s^3\gamma ^3}\right) ^{-1} \left( {F(z_k)+F(z_{k+1})}\right) ^T \\&\qquad \left[ \begin{matrix} \gamma A+ BB^T &{} 0\\ 0 &{} \gamma C+ B^T B \end{matrix}\right] \left( {F(z_k)+F(z_{k+1})}\right) \\&\quad \le ~ \frac{s}{8} (2s\gamma ) \left( {1 -s\gamma - 4s^3\gamma ^3}\right) ^{-1} \left( {F(z_k)+F(z_{k+1})}\right) ^T \\&\qquad \left[ \begin{matrix} A+ s BB^T &{} 0\\ 0 &{} C+ s B^T B \end{matrix}\right] \left( {F(z_k)+F(z_{k+1})}\right) \\&\quad \le ~ \frac{s}{4} (2s\gamma ) \left( {1 -s\gamma - 4s^3\gamma ^3}\right) ^{-1} \left( {F(z_k)+F(z_{k+1})}\right) ^T \\&\qquad \left[ \begin{matrix} A- \frac{s}{2} A^2+ \frac{s}{2} BB^T &{} 0 \\ 0 &{} C- \frac{s}{2} C^2+ \frac{s}{2} B^T B \end{matrix}\right] \left( {F(z_k)+F(z_{k+1})}\right) \ . \end{aligned} \end{aligned}$$
(107)

Substituting (104), (105), (106), (107) into (103), and noticing that \(s\gamma \le \frac{1}{8}\), we obtain

$$\begin{aligned} \begin{aligned}&\frac{1}{2}\Vert F(z_{k+1})\Vert ^2-\frac{1}{2}\Vert F(z_{k})\Vert ^2 \\&\quad \le - \frac{s}{4} \left( 1-3(s\gamma )^2-2\left( \tfrac{(2s\gamma )}{4} +\tfrac{(2s\gamma )^3}{2}+\tfrac{(2s\gamma )^5}{4} \left( 1 -s\gamma - 4s^3\gamma ^3\right) ^{-1}\right. \right. \\&\qquad \left. - (2s\gamma ) \left( 1 -s\gamma - 4s^3\gamma ^3\right) ^{-1}\right) \\&\qquad \times \left( {F(z_k)+F(z_{k+1})}\right) ^T \left[ \begin{matrix} A- \frac{s}{2} A^2+ \frac{s}{2} BB^T &{} 0 \\ 0 &{} C- \frac{s}{2} C^2+ \frac{s}{2} B^T B \end{matrix}\right] \left( {F(z_k)+F(z_{k+1})}\right) \\&\quad \le -\frac{s}{8} \left( {F(z_k)+F(z_{k+1})}\right) ^T \left[ \begin{matrix} A- \frac{s}{2} A^2+ \frac{s}{2} BB^T &{} 0 \\ 0 &{} C- \frac{s}{2} C^2+ \frac{s}{2} B^T B \end{matrix}\right] \left( {F(z_k)+F(z_{k+1})}\right) \\&\quad \le -\frac{s\rho (s)}{16} \Vert F(z_k)+F(z_{k+1})\Vert ^2 \ . \end{aligned} \end{aligned}$$
(108)

It then follows from Proposition 10 that

$$\begin{aligned} \frac{1}{2}\Vert F(z_{k+1})\Vert ^2-\frac{1}{2}\Vert F(z_{k})\Vert ^2 \le -\frac{s\rho (s)}{10} \Vert F(z_{k+1})\Vert ^2 -\frac{s\rho (s)}{10}\Vert F(z_{k})\Vert ^2 , \end{aligned}$$

and after rearrangement, we arrive at

$$\begin{aligned} \Vert F(z_{k+1})\Vert ^2 \le \left( {\frac{1-\frac{s\rho (s)}{5}}{1+\frac{s\rho (s)}{5}}}\right) \Vert F(z_{k})\Vert ^2\ , \end{aligned}$$

which finishes the proof by telescoping. \(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lu, H. An \(O(s^r)\)-resolution ODE framework for understanding discrete-time algorithms and applications to the linear convergence of minimax problems. Math. Program. 194, 1061–1112 (2022). https://doi.org/10.1007/s10107-021-01669-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10107-021-01669-4

Mathematics Subject Classification

Navigation