Convergence of a Class of Stationary Iterative Methods for Saddle Point Problems

Zhang, Yin

doi:10.1007/s40305-019-00249-w

Convergence of a Class of Stationary Iterative Methods for Saddle Point Problems

Open access
Published: 15 May 2019

Volume 7, pages 195–204, (2019)
Cite this article

Download PDF

You have full access to this open access article

Journal of the Operations Research Society of China Aims and scope Submit manuscript

Convergence of a Class of Stationary Iterative Methods for Saddle Point Problems

Download PDF

Yin Zhang^1,2

1981 Accesses
1 Citation
Explore all metrics

Abstract

A unified convergence theory is derived for a class of stationary iterative methods for solving linear equality constrained quadratic programs or saddle point problems. This class is constructed from essentially all possible splittings of the submatrix residing in the (1,1)-block of the augmented saddle point matrix that would produce non-expansive iterations. The classic augmented Lagrangian method and alternating direction method of multipliers are two special members of this class.

On Convergence of the Arrow–Hurwicz Method for Saddle Point Problems

Article 29 April 2022

Projection Generalized Two-Point Extragradient Quasi-Newton Method for Saddle-Point and Other Problems

Article 01 February 2020

Iterative Solution Methods for Large-Scale Constrained Saddle-Point Problems

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Consider the equality constrained quadratic program:

$$\begin{aligned} \min _{x\in {\mathbb {R}}^n}\; \frac{1}{2}x^\mathrm{T}Ax - b^\mathrm{T}x \quad \hbox {s.t.}~ Bx = c, \end{aligned}$$

(1.1)

where $A \in {\mathbb {R}}^{n\times n}$ is symmetric and $B \in {\mathbb {R}}^{m\times n}$ with $m<n$. The matrix A can be indefinite, but is assumed to be positive definite in the null space of B. Without loss of generality, we assume that B is of full rank m. The system of stationarity for the quadratic program (1.1) is

$$\begin{aligned} Ax + B^\mathrm{T}y - b= & {} 0, \\ Bx -c= & {} 0, \end{aligned}$$

where $x\in {\mathbb {R}}^n$ is the primal variable and $y\in {\mathbb {R}}^m$ is the Lagrangian multiplier (or dual variable). In matrix form, the $n+m$ by $n+m$ system is

$$\begin{aligned} \left( \begin{array}{cc} A &{} B^\mathrm{T} \\ B &{} 0 \end{array}\right) \left( \begin{array}{c} x \\ y \end{array}\right) = \left( \begin{array}{c} b \\ c \end{array}\right) , \end{aligned}$$

(1.2)

which is commonly called the augmented system or saddle point system—a problem with a wide range of applications in various areas of computational science and engineering. Numerical solutions of this problem have been extensively studied in the literature; see the survey paper [1] for a comprehensive review and a thorough list of references up to 2004.

The augmented Lagrangian technique has been used to make the (1,1)-block of the saddle point system positive definite. In this approach, an equivalent system is solved,

$$\begin{aligned} Ax + B^\mathrm{T}y - b + \gamma B^\mathrm{T}(Bx - c)= & {} 0,\\ Bx - c= & {} 0 \end{aligned}$$

with a parameter $\gamma > 0$, which has the matrix form

$$\begin{aligned} \left( \begin{array}{cc} A + \gamma B^\mathrm{T}B &{} B^\mathrm{T} \\ B &{} 0 \end{array}\right) \left( \begin{array}{c} x \\ y \end{array}\right) = \left( \begin{array}{c} b + \gamma B^\mathrm{T}c \\ c \end{array}\right) . \end{aligned}$$

(1.3)

The following result is a well-known fact.

Proposition 1.1

Let A be symmetric positive definite in the null space of B. If $A \succeq 0$, then $A + \gamma B^\mathrm{T}B \succ 0$ for $\gamma \in (0,+\infty );$ otherwise, there exists some ${{\hat{\gamma }}} > 0$ such that

$$\begin{aligned} \gamma \in ({\hat{\gamma }},+\infty ) ~~\Longrightarrow ~~ A + \gamma B^\mathrm{T}B \succ 0. \end{aligned}$$

(1.4)

1.1 Notation

For matrix $M \in {\mathbb {R}}^{n\times n}$, $\sigma (M)$ denotes the spectrum of M and $\rho (M)$ the spectral radius of M. For symmetric M, $\lambda _{\max }(M)$ ($\lambda _{\min }(M)$) is the maximum (minimum) eigenvalue of M. By $M \succ 0$ ($M \succeq 0$), we mean that M is symmetric positive definite (semi-definite). For a complex number $z \in {\mathbb {C}}$, $\mathfrak {R}(z)$ denotes the real part of z and $\mathfrak {I}(z)$ the imaginary part.

2 A Class of Stationary Iterative Methods

In this section, we describe a class of stationary iterative methods for solving the saddle point problem (1.3) where the (1,1)-block has been made positive definite. For convenience, we re-parameterize the first equation and introduce another parameter into the second. The equivalent system under consideration is

$$\begin{aligned} \left( \begin{array}{cc} H(\alpha ) &{} -B^\mathrm{T} \\ \tau B &{} 0 \end{array}\right) \left( \begin{array}{c} x \\ y \end{array}\right) = \left( \begin{array}{c} \alpha b + B^\mathrm{T}c \\ \tau c \end{array}\right) , \end{aligned}$$

(2.1)

where $\alpha > 0$, $\tau \ne 0$ and

$$\begin{aligned} H(\alpha ) = \alpha A + B^\mathrm{T}B \succ 0. \end{aligned}$$

Comparing (1.3) to (2.1), we see that $\alpha =1/\gamma >0$ and the multiplier y has been rescaled along with a sign change. These changes are cosmetic except that one more parameter $\tau $ is introduced into the second equation of (2.1).

Since the equation $Bx=c$ is equivalent to $QBx=Qc$ for any non-singular $Q\in {\mathbb {R}}^{m\times m}$, B and c in (2.1) can obviously be replaced by QB and Qc, respectively.

2.1 Splitting of the (1,1)-Block

In our framework, the (1,1)-block submatrix $H(\alpha )$ in (2.1) is split into a “left part” L and a “right part” R; that is,

$$\begin{aligned} H := \alpha A + B^\mathrm{T}B = L - R. \end{aligned}$$

(2.2)

We drop the $\alpha $-dependence from H, as well as from L and R, since $\alpha $ will always be fixed in our analysis as long as $H \succ 0$ is maintained, even though it can also be varied to improve convergence performance.

In this report, unless otherwise noted, splittings refer to those for the (1,1)-block submatrix H rather than for the entire $(2 \times 2)$-block augmented matrix of the saddle point problem. Moreover, we will associate a splitting with a left–right pair (L, R). Simplest examples of splittings include

$$\begin{aligned} L=H,\quad \;\; R=0; \end{aligned}$$

or after partitioning H into 2-by-2 blocks,

$$\begin{aligned} L = \left( \begin{array}{cc} H_{11} &{} 0 \\ 0 &{} H_{22} \\ \end{array}\right) ,\quad \;\; R = -\left( \begin{array}{cc} 0 &{} H_{12} \\ H_{21} &{} 0 \end{array}\right) , \end{aligned}$$

which is of block Jacobi type; or

$$\begin{aligned} L = \left( \begin{array}{cc} H_{11} &{} 0 \\ H_{21} &{} H_{22} \\ \end{array}\right) ,\quad \;\; R = -\left( \begin{array}{cc} 0 &{} H_{12} \\ 0 &{} 0 \end{array}\right) , \end{aligned}$$

(2.3)

which is of block Gauss–Seidel type. We note that when $H \succ 0$ and (L, R) is a Gauss–Seidel splitting, either element-wise or block-wise, it is known that $\rho (L^{-1}R)<1$.

In general, one can first partition H into p-by-p blocks for any $p \in \{1,2,\cdots ,n\}$, then perform a block splitting. In addition, splittings can be of SOR type involving an extra relaxation parameter. To keep notation simple, however, we will not carry such a parameter in a splitting (L, R) since it does not affect our analysis.

2.2 A Stationary Iteration Class

We consider a class of stationary iterations consisting of all possible splittings (L, R) for which the spectral radius of $L^{-1}R$ does not exceed the unity (plus an additional technical condition to be specified soon). This class of stationary iterative methods, that we call the $\mathtt {\{L,\!R\}}$-class for lack of a more descriptive term, iterates as follows:

$$\begin{aligned} x^{k+1}= & {} L^{-1}\left( Rx^k + B^\mathrm{T}(y^k + c) + \alpha b\right) , \end{aligned}$$

(2.4a)

$$\begin{aligned} y^{k+1}= & {} y^k - \tau \left( Bx^{k+1}-c\right) , \end{aligned}$$

(2.4b)

where (L, R) is any admissible splitting and $\tau $ represents a step length in multiplier updates.

It is easy to see that the $\mathtt {\{L,\!R\}}$-class iterations (2.4) correspond to the following splitting of the $(2 \times 2)$-block augmented matrix in system (2.1):

$$\begin{aligned} \left( \begin{array}{cc} H &{} -B^\mathrm{T} \\ \tau B &{} 0 \end{array}\right) = \left( \begin{array}{cc} L &{} 0 \\ \tau B &{} I \end{array}\right) - \left( \begin{array}{cc} R &{} B^\mathrm{T} \\ 0 &{} I \end{array}\right) . \end{aligned}$$

(2.5)

Therefore, the resulting iteration matrix is

$$\begin{aligned} M(\tau ) := \left( \begin{array}{cc} L &{} 0 \\ \tau B &{} I \end{array}\right) ^{-1}\!\! \left( \begin{array}{cc} R &{} B^\mathrm{T} \\ 0 &{} I \end{array}\right) = \left( \begin{array}{cc} L^{-1}R &{} L^{-1}B^\mathrm{T} \\ -\tau BL^{-1}R &{} I-\tau BL^{-1}B^\mathrm{T} \end{array}\right) . \end{aligned}$$

(2.6)

It is worth observing that the results of the present paper still hold if in the right-hand side of (2.5) the identity matrix in the (2,2)-blocks is replaced by any symmetric positive definite matrix^{Footnote 1}.

From the well-known theory for stationary iterative methods for linear systems, we have

Proposition 2.1

A member of the $\mathtt {\{L,\!R\}}$-class converges Q-linearly from any initial point if and only if the corresponding iteration matrix $M(\tau )$, for some value $\tau $, satisfies

$$\begin{aligned} \rho (M(\tau )) < 1. \end{aligned}$$

(2.7)

In this paper, we establish that, under two reasonable assumptions, condition (2.7) holds for the entire $\mathtt {\{L,\!R\}}$-class.

2.3 Classic Methods ALM and ADMM

The trivial splitting $(L,R)=(H,0)$ gives the classic augmented Lagrangian multiplier (ALM) method [2, 3], which is also equivalent to Uzawa’s method [4] applied to (1.3). In this case,

$$\begin{aligned} M(\tau ) = \left( \begin{array}{cc} 0 &{} H^{-1}B^\mathrm{T} \\ 0 &{} I-\tau BH^{-1}B^\mathrm{T} \end{array}\right) , \end{aligned}$$

and

$$\begin{aligned} \rho (M(\tau )) = \rho \left( I-\tau BH^{-1}B^\mathrm{T}\right) \end{aligned}$$

(2.8)

leading to the well-known convergence result for the multiplier method.

Proposition 2.2

The augmented Lagrangian multiplier method applied to the quadratic program (1.1) converges Q-linearly from any initial point for $\tau {\in } \left( 0,2/\lambda _{\max }(BH^{-1}B^\mathrm{T})\right) $, where $H=\alpha A+B^\mathrm{T}B \succ 0$. Moreover, when $A \succeq 0$, $\tau \in (0,2)$ suffices for convergence.

The classic ALM method, or Uzawa’s method applied to (1.3), is the unique member of the $\mathtt {\{L,\!R\}}$-class that requires solving systems involving the entire (1,1)-block submatrix H (with a different right-hand side from iteration to iteration). On the other hand, all other $\mathtt {\{L,\!R\}}$-class members only require solving systems involving the left part L which can be much less expensive if L are chosen to exploit problem structures.

When the splitting of H is of the $(2 \times 2)$-block Gauss–Seidel type as is defined in (2.3), the associated $\mathtt {\{L,\!R\}}$-class member reduces to the classic alternating direction method of multipliers, i.e., ADMM [5, 6], for which convergence has been established for general convex functions not restricted to quadratics. However, such a general theory requires objective functions to be a sum of two separable functions with respect to two block variables, and both convex in the entire space. Apparently, no convergence results are available, to the best of our knowledge, when the objective is non-separable, or is convex only in a subspace, or the number of block variables exceeds two (unless algorithmic modifications are introduced).

3 Convergence of the Entire Class

We present a unified convergence result for the entire $\mathtt {\{L,\!R\}}$-class under two assumptions:

A1.
$H := \alpha A + B^\mathrm{T}B \succ 0$, where B is of rank m.
A2.
$H=L-R$ satisfies $\rho (L^{-1}R) \leqslant 1$ and the condition (3.1).

We know that Assumption A1 holds for appropriate $\alpha $ values if $A \in {\mathbb {R}}^{n\times n}$ is positive definite in the null space of B, see Proposition 1.1. We further require that $L^{-1}R$ have no eigenvalue of unit modulus or greater except possibly the unity itself being an eigenvalue; that is,

$$\begin{aligned} \max \left\{ |\mu |: \mu \in \sigma (L^{-1}R) {\setminus } \{1\} \right\} < 1. \end{aligned}$$

(3.1)

Now we present a unified convergence theorem for the entire $\mathtt {\{L,\!R\}}$-class.

Theorem 3.1

Let $\{(x^k,y^k)\}$ be generated from any initial point by a member of the $\mathtt {\{L,\!R\}}$-class defined by (2.4). Under Assumptions A1–A2, there exists $\eta > 0$ such that for all $\tau \in (0, 2\eta )$ the sequence $\{(x^k,y^k)\}$ converges Q-linearly to the solution of (1.1).

The proof is left to the next section after we develop some technical results. We note that the convergence interval $(0, 2\eta )$ is member-dependent. It can also depend on the value of the parameter $\alpha >0$ in $H(\alpha ) = \alpha A + B^\mathrm{T}B \succ 0$.

It is worth noting that the theorem only requires $L^{-1}R$, as a linear mapping in ${\mathbb {R}}^n$, to be non-expansive (plus a technical condition) rather than contractive. Convergence would not necessarily happen if one kept iterating on the primal variable x only. However, timely updating the multiplier y helps iterates for the pair (x, y) converge together.

4 Technical Results and Proof of Convergence

We first derive some useful technical lemmas. Let $\lambda (\tau )$ be an eigenvalue of $M(\tau )$, i.e.,

$$\begin{aligned} \lambda (\tau ) \in \sigma (M(\tau )). \end{aligned}$$

(4.1)

The eigenvalue system corresponding to $\lambda $ is

$$\begin{aligned} \left( \begin{array}{cc} L^{-1}R &{} L^{-1}B^\mathrm{T} \\ -\tau BL^{-1}R &{} I-\tau BL^{-1}B^\mathrm{T} \end{array}\right) \left( \begin{array}{c} u(\tau ) \\ v(\tau ) \end{array}\right) = \lambda (\tau ) \left( \begin{array}{c} u(\tau ) \\ v(\tau ) \end{array}\right) , \end{aligned}$$

(4.2)

where $(u,v) \in {\mathbb {C}}^{n}\times {\mathbb {C}}^m$ is nonzero. For simplicity, we will often skip the $\tau $-dependence of the eigenpair if no confusion arises.

Lemma 4.1

If $\rho (L^{-1}R) \leqslant 1$, then

$$\begin{aligned} \rho (M(0)) = 1. \end{aligned}$$

Under condition (3.1), the maximum eigenvalue of $M(\tau )$ in modulus, $\lambda (\tau )$, satisfies

$$\begin{aligned} \lim _{\tau \rightarrow 0} \frac{\lambda (\tau )-1}{\lambda (\tau )}=0. \end{aligned}$$

(4.3)

Proof

From the definition of $M(\tau )$ in (2.6),

$$\begin{aligned} M(0) = \left( \begin{array}{cc} L^{-1}R &{} L^{-1}B^\mathrm{T} \\ 0 &{} I \end{array}\right) . \end{aligned}$$

Hence by our assumption $\rho (M(0)) = \max (1,\rho (L^{-1}R)) = 1 \in \sigma (M(0))$. The second part follows from the continuity of eigenvalues as functions of matrix elements and condition (3.1).

Lemma 4.2

Let the matrix A be positive definite in the null space of B and $\alpha > 0$, or the sum $H = \alpha A + B^\mathrm{T}B$ be positive definite. Then for any $\tau >0$, $ 1 \notin \sigma (M(\tau )) $ where $M(\tau )$ is defined as in (2.6).

Proof

We examine eigensystem (4.2). Rearranging the first equation of (4.2), we have

$$\begin{aligned} (\lambda L - R) u = B^\mathrm{T}v. \end{aligned}$$

(4.4)

Multiplying the first equation by $\tau B$ and adding to the second of (4.2), after rearranging we obtain

$$\begin{aligned} (1-\lambda )v = \lambda \tau Bu. \end{aligned}$$

(4.5)

Suppose that $\lambda =1$. Then (4.5) implies $Bu=0$. By definition (2.2), equation (4.4) reduces to

$$\begin{aligned} (L - R) u \equiv (\alpha A + B^\mathrm{T}B)u = B^\mathrm{T}v. \end{aligned}$$

Multiplying the above equation by $u^*$ and invoking $Bu=0$, we arrive at $u^*Hu = u^*Au = 0$, contradicting to the assumption of the lemma.

Lemma 4.3

Let $(\lambda ,(u,v))$ be an eigenpair of $M(\tau )$ as is given in (4.2) where $\lambda \notin \{0,1\}$ and $Bu \ne 0$, then

$$\begin{aligned} \lambda = 1 - {\tau }\left( {\frac{u^*Hu}{u^*B^\mathrm{T}Bu} + \frac{\lambda -1}{\lambda }\frac{u^*Ru}{u^*B^\mathrm{T}Bu}}\right) ^{-1}. \end{aligned}$$

(4.6)

Proof

It follows readily from (4.5) that

$$\begin{aligned} v = \frac{\lambda \tau }{1-\lambda } Bu. \end{aligned}$$

(4.7)

Substituting the above into (4.4) and in view of (2.2), we have

$$\begin{aligned} \left( \lambda H + (\lambda -1)R\right) u = \frac{\lambda \tau }{1-\lambda } B^\mathrm{T}Bu, \end{aligned}$$

or after a rearrangement,

$$\begin{aligned} \left( H - \frac{\tau }{1-\lambda } B^\mathrm{T}B\right) u = \frac{1-\lambda }{\lambda } Ru. \end{aligned}$$

(4.8)

Multiplying both sides of (4.8) by $u^*$, we have

$$\begin{aligned} u^*Hu - \frac{\tau }{1-\lambda } u^*B^\mathrm{T}Bu = \frac{1-\lambda }{\lambda } u^*Ru. \end{aligned}$$

Since $u^*B^\mathrm{T}Bu \ne 0$, the above equation can be rewritten into

$$\begin{aligned} \frac{\tau }{1-\lambda } = \frac{u^*Hu}{u^*B^\mathrm{T}Bu} + \frac{\lambda -1}{\lambda } \frac{u^*Ru}{u^*B^\mathrm{T}Bu}. \end{aligned}$$

(4.9)

Solving for the $\lambda $ on the left-hand side of (4.9) while fixing the ones on the right, we obtain the desired result where the denominator term must be nonzero.

Lemma 4.4

Let $\tau , \kappa \in {\mathbb {R}}$ and $z = \mathfrak {R}(z) + i\mathfrak {I}(z) \in {\mathbb {C}}$ such that $\kappa + \mathfrak {R}(z) > 0$. Then

$$\begin{aligned} \tau \in \left( 0, \, 2(\kappa +\mathfrak {R}(z))\right) ~~\Longleftrightarrow ~~ \left| 1 - \frac{\tau }{\kappa + z}\right| < 1. \end{aligned}$$

(4.10)

Moreover, $\tau = \kappa +\mathfrak {R}(z)$ minimizes the above modulus so that

$$\begin{aligned} \min _\tau \left| 1 - \frac{\tau }{\kappa + z}\right| = \left| 1 - \frac{\kappa +\mathfrak {R}(z)}{\kappa + z}\right| = \frac{|\mathfrak {I}(z)|}{|\kappa +z|}. \end{aligned}$$

(4.11)

Proof

By direct calculation,

$$\begin{aligned} \left| 1 - \frac{\tau }{\kappa + z}\right| ^2 = 1 - \tau \frac{2(\kappa +\mathfrak {R}(z))-\tau }{|\kappa +z|^2} = \frac{(\kappa +\mathfrak {R}(z)-\tau )^2 + \mathfrak {I}(z)^2}{(\kappa +\mathfrak {R}(z))^2 + \mathfrak {I}(z)^2}, \end{aligned}$$

(4.12)

from which both (4.10) and (4.11) follow.

Now we are ready to prove Theorem 3.1.

Proof

The proof is based on Lemmas 4.1, 4.3 and 4.4, while Lemma 4.2 is implicitly used.

Let $(\lambda (\tau ),(u(\tau ),v(\tau )))$ be an eigenpair of $M(\tau )$ corresponding to an eigenvalue of maximum modulus. Clearly, $\lambda (\tau ) \notin \{0,1\}$. We need to prove that $|\lambda (\tau )| < 1$ for some values of $\tau > 0$. In the rest of the proof, we often skip the dependence on $\tau $.

We consider two cases: $Bu=0$ and $Bu \ne 0$. If $Bu=0$, then (4.5) implies $v=0$, and (4.4) implies that $(\lambda ,u)$ is an eigenpair of $L^{-1}R$. Therefore, $|\lambda |<1$ by Assumption A2. Now we assume that $Bu \ne 0$. By Lemmas 4.3 and 4.4, $|\lambda (\tau )| < 1$ if and only if the following inclusion is feasible,

$$\begin{aligned} \tau \in \left( 0,\, 2 \Theta (\tau )\right) , \end{aligned}$$

(4.13)

where

$$\begin{aligned} \Theta (\tau ):= & {} \frac{u^*Hu}{u^*B^\mathrm{T}Bu} + \mathfrak {R}\left( \frac{\lambda -1}{\lambda }\frac{u^*Ru}{u^*B^\mathrm{T}Bu}\right) \nonumber \\= & {} \frac{u^*u}{u^*B^\mathrm{T}Bu}\left( \frac{u^*Hu}{u^*u} + \mathfrak {R}\left( \frac{\lambda -1}{\lambda }\frac{u^*Ru}{u^*u}\right) \right) . \end{aligned}$$

(4.14)

Under Assumption A2, we know from (4.3) in Lemma 4.1 that $1-1/\lambda (\tau ) \rightarrow 0$ as $\tau \rightarrow 0$. Hence, in view of the boundedness of ${u^*Ru}/{u^*u}$, for any $\delta \in (0,1)$ there exists $\xi _{\delta } > 0$ such that

$$\begin{aligned} \mathfrak {R}\left( \frac{\lambda -1}{\lambda }\frac{u^*Ru}{u^*u}\right) \geqslant -\left| \frac{\lambda -1}{\lambda }\right| \, \frac{|u^*Ru|}{u^*u} \geqslant -\delta \lambda _{\min }(H),\quad \;\; \forall \, \tau \in (0, 2\xi _{\delta }). \end{aligned}$$

(4.15)

We now estimate $\Theta (\tau )$ for $\tau \in (0, 2\xi _{\delta })$ from (4.14) and (4.15),

$$\begin{aligned} \Theta (\tau )\geqslant & {} \frac{\lambda _{\min }(H) -\delta \lambda _{\min }(H)}{\lambda _{\max }(B^\mathrm{T}B)} = (1-\delta )\frac{\lambda _{\min }(H)}{\lambda _{\max }(B^\mathrm{T}B)} := \theta _{\delta } > 0, \nonumber \\&\forall \, \tau \in (0, 2\xi _{\delta }). \end{aligned}$$

(4.16)

It follows from (4.16) that inclusion (4.13) indeed holds for all $\tau \in (0, 2\eta ),$ where

$$\begin{aligned} \eta := \min (\xi _{\delta },\theta _{\delta }). \end{aligned}$$

(4.17)

This completes the proof.

In view of the second part of Lemma 4.4, if there exists $\tau _o>0$ such that $\tau _o = \Theta (\tau _o)$, then the optimal rate of convergence (for a given $\alpha >0$ and a given splitting) would be

$$\begin{aligned} \frac{|\mathfrak {I}(z(\tau _o))|}{|u(\tau _o)^*Hu(\tau _o) + z(\tau _o)|} = \left( 1 + \frac{(u(\tau _o)^*Hu(\tau _o)+\mathfrak {R}(z(\tau _o)))^2}{\mathfrak {I}(z(\tau _o))^2}\right) ^{-\frac{1}{2}} < 1,\qquad \end{aligned}$$

(4.18)

where $ z(\tau ) := \frac{\lambda (\tau )-1}{\lambda (\tau )}{(u(\tau )^*Ru(\tau ))} $ whose imaginary part must be nonzero at $\tau =\tau _o$. Of course, such an optimal rate of convergence is generally not computable in practice.

5 Remarks

The $\mathtt {\{L,\!R\}}$-class defined by (2.4) is constructed from splittings of the (1,1)-block of the saddle point system matrix that includes, but is not limited to, all known convergent splittings for positive definite matrices, offering adaptivity to problem structures with guaranteed convergence.

Those $\mathtt {\{L,\!R\}}$-class members associated with block Gauss–Seidel splittings are natural extensions to the classic ADMM specialized to quadratic programs. In contrast to the existing general convergence theory for ADMM, Theorem 3.1 does not require separability, nor convexity in the entire space, and imposes no restriction on the number of blocks, while giving a Q-linear rate of convergence. It should be of great interest to extend these properties beyond quadratic programs, which will be a topic to be addressed in another work.

The convergence of certain members of the $\mathtt {\{L,\!R\}}$-class has been studied in [7] under the assumption that L is symmetric positive definite. In [8], a special case corresponding to the SOR-splitting has been analyzed.

Notes

Yang, J.: Private communications. (August, 2010)

References

Benzi, M., Golub, G.H., Liesen, J.: Numerical solution of saddle point problems. Acta Numer. 14, 1–137 (2005)
Article MathSciNet MATH Google Scholar
Hestenes, M.R.: Multiplier and gradient methods. J. Optim. Theory Appl. 4, 303–320 (1969)
Article MathSciNet MATH Google Scholar
Powell, M. J. D.: A method for nonlinear constraints in minimization problems. Optimization, R. Fletcher, ed., Academic Press, New York, NY, pp. 283–298 (1969)
Uzawa, H.: Iterative methods for concave programming. In: Arrow, K.J., Hurwicz, L., Uzawa, H. (eds.) Studies in Linear and Nonlinear Programming, pp. 154–165. Stanford University Press, Stanford (1958)
Google Scholar
Glowinski, R., Marrocco, A.: Sur lapproximation par elements finis dordre un, et la resolution par penalisation-dualite dune classe de problemes de Dirichlet nonlineaires. Rev. Francaise dAut. Inf. Rech. Oper. R–2, 41–76 (1975)
Google Scholar
Gabay, D., Mercier, B.: A dual algorithm for the solution of nonlinear variational problems via finite-element approximations. Comput. Math. Appl. 2, 17–40 (1976)
Article MATH Google Scholar
Zulehner, W.: Analysis of iterative methods for saddle point problems: a unified approach. Math. Comput. 71(238), 479–505 (2001)
Article MathSciNet MATH Google Scholar
Zhang, J., Shang, J.: A class of Uzawa-SOR methods for saddle point problems. Appl. Math. Comput. 216, 2163–2168 (2010)
MathSciNet MATH Google Scholar

Download references

Acknowledgements

The author is appreciative of useful discussions with Dr. Junfeng Yang of Nanjing University.

Author information

Authors and Affiliations

Institute for Data and Decision Analytics, The Chinese University of Hong Kong, Shenzhen, 518172, China
Yin Zhang
Department of Computational and Applied Mathematics, Rice University, Houston, 77005, United States of America
Yin Zhang

Authors

Yin Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yin Zhang.

Additional information

This paper is a polished version of the Rice University technical report CAAMTR10-24, which was a work supported in part by the National Natural Science Foundation (No. DMS-0811188) and Office of Navy Research (No. N00014-08-1-1101). It was later also supported in part by a research Grant administrated by the Shenzhen Research Institute of Big Data. The wording in this abstract, as in some other places, has been modified from the original version. All cited publications appear prior to the year 2011.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons License, and indicate if changes were made.

Reprints and permissions

About this article

Cite this article

Zhang, Y. Convergence of a Class of Stationary Iterative Methods for Saddle Point Problems. J. Oper. Res. Soc. China 7, 195–204 (2019). https://doi.org/10.1007/s40305-019-00249-w

Download citation

Received: 23 August 2018
Revised: 25 April 2019
Accepted: 25 April 2019
Published: 15 May 2019
Issue Date: 05 June 2019
DOI: https://doi.org/10.1007/s40305-019-00249-w

Keywords

Mathematics Subject Classification

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Convergence of a Class of Stationary Iterative Methods for Saddle Point Problems

Abstract

Similar content being viewed by others

On Convergence of the Arrow–Hurwicz Method for Saddle Point Problems

Projection Generalized Two-Point Extragradient Quasi-Newton Method for Saddle-Point and Other Problems

Iterative Solution Methods for Large-Scale Constrained Saddle-Point Problems

1 Introduction

Proposition 1.1

1.1 Notation

2 A Class of Stationary Iterative Methods

2.1 Splitting of the (1,1)-Block

2.2 A Stationary Iteration Class

Proposition 2.1

2.3 Classic Methods ALM and ADMM

Proposition 2.2

3 Convergence of the Entire Class

Theorem 3.1

4 Technical Results and Proof of Convergence

Lemma 4.1

Proof

Lemma 4.2

Proof

Lemma 4.3

Proof

Lemma 4.4

Proof

Proof

5 Remarks

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation