Convergence of a Class of Stationary Iterative Methods for Saddle Point Problems

A unified convergence theory is derived for a class of stationary iterative methods for solving linear equality constrained quadratic programs or saddle point problems. This class is constructed from essentially all possible splittings of the submatrix residing in the (1,1)-block of the augmented saddle point matrix that would produce non-expansive iterations. The classic augmented Lagrangian method and alternating direction method of multipliers are two special members of this class.


Introduction
Consider the equality constrained quadratic program: where A ∈ R n×n is symmetric and B ∈ R m×n with m < n. The matrix A can be indefinite, but is assumed to be positive definite in the null space of B. Without loss of generality, we assume that B is of full rank m. The system of stationarity for the quadratic program (1.1) is where x ∈ R n is the primal variable and y ∈ R m is the Lagrangian multiplier (or dual variable). In matrix form, the n + m by n + m system is which is commonly called the augmented system or saddle point system-a problem with a wide range of applications in various areas of computational science and engineering. Numerical solutions of this problem have been extensively studied in the literature; see the survey paper [1] for a comprehensive review and a thorough list of references up to 2004. The augmented Lagrangian technique has been used to make the (1,1)-block of the saddle point system positive definite. In this approach, an equivalent system is solved, with a parameter γ > 0, which has the matrix form The following result is a well-known fact.

Notation
For matrix M ∈ R n×n , σ (M) denotes the spectrum of M and ρ(M) the spectral radius of M. For symmetric M, λ max (M) (λ min (M)) is the maximum (minimum) eigenvalue of M. By M 0 (M 0), we mean that M is symmetric positive definite (semi-definite). For a complex number z ∈ C, (z) denotes the real part of z and (z) the imaginary part.

A Class of Stationary Iterative Methods
In this section, we describe a class of stationary iterative methods for solving the saddle point problem (1.3) where the (1,1)-block has been made positive definite. For convenience, we re-parameterize the first equation and introduce another parameter into the second. The equivalent system under consideration is where α > 0, τ = 0 and Comparing (1.3) to (2.1), we see that α = 1/γ > 0 and the multiplier y has been rescaled along with a sign change. These changes are cosmetic except that one more parameter τ is introduced into the second equation of (2.1).
Since the equation Bx = c is equivalent to Q Bx = Qc for any non-singular Q ∈ R m×m , B and c in (2.1) can obviously be replaced by Q B and Qc, respectively.

Splitting of the (1,1)-Block
In our framework, the (1,1)-block submatrix H (α) in (2.1) is split into a "left part" L and a "right part" R; that is, We drop the α-dependence from H , as well as from L and R, since α will always be fixed in our analysis as long as H 0 is maintained, even though it can also be varied to improve convergence performance. In this report, unless otherwise noted, splittings refer to those for the (1,1)-block submatrix H rather than for the entire (2 × 2)-block augmented matrix of the saddle point problem. Moreover, we will associate a splitting with a left-right pair (L, R). Simplest examples of splittings include or after partitioning H into 2-by-2 blocks, which is of block Jacobi type; or which is of block Gauss-Seidel type. We note that when H 0 and (L, R) is a Gauss-Seidel splitting, either element-wise or block-wise, it is known that ρ(L −1 R) < 1.
In general, one can first partition H into p-by-p blocks for any p ∈ {1, 2, · · · , n}, then perform a block splitting. In addition, splittings can be of SOR type involving an extra relaxation parameter. To keep notation simple, however, we will not carry such a parameter in a splitting (L, R) since it does not affect our analysis.

A Stationary Iteration Class
We consider a class of stationary iterations consisting of all possible splittings (L, R) for which the spectral radius of L −1 R does not exceed the unity (plus an additional technical condition to be specified soon). This class of stationary iterative methods, that we call the {L,R}-class for lack of a more descriptive term, iterates as follows: where (L, R) is any admissible splitting and τ represents a step length in multiplier updates.
It is easy to see that the {L,R}-class iterations (2.4) correspond to the following splitting of the (2 × 2)-block augmented matrix in system (2.1): Therefore, the resulting iteration matrix is It is worth observing that the results of the present paper still hold if in the righthand side of (2.5) the identity matrix in the (2,2)-blocks is replaced by any symmetric positive definite matrix 1 . From the well-known theory for stationary iterative methods for linear systems, we have (2.7) In this paper, we establish that, under two reasonable assumptions, condition (2.7) holds for the entire {L,R}-class.

Classic Methods ALM and ADMM
The trivial splitting (L, R) = (H , 0) gives the classic augmented Lagrangian multiplier (ALM) method [2,3], which is also equivalent to Uzawa's method [4] applied to (1.3). In this case, leading to the well-known convergence result for the multiplier method.

Proposition 2.2
The augmented Lagrangian multiplier method applied to the quadratic program (1.1) converges Q-linearly from any initial point for When the splitting of H is of the (2 × 2)-block Gauss-Seidel type as is defined in (2.3), the associated {L,R}-class member reduces to the classic alternating direction method of multipliers, i.e., ADMM [5,6], for which convergence has been established for general convex functions not restricted to quadratics. However, such a general theory requires objective functions to be a sum of two separable functions with respect to two block variables, and both convex in the entire space. Apparently, no convergence results are available, to the best of our knowledge, when the objective is non-separable, or is convex only in a subspace, or the number of block variables exceeds two (unless algorithmic modifications are introduced). We know that Assumption A1 holds for appropriate α values if A ∈ R n×n is positive definite in the null space of B, see Proposition 1.1. We further require that L −1 R have no eigenvalue of unit modulus or greater except possibly the unity itself being an eigenvalue; that is, max |μ| : μ ∈ σ (L −1 R)\{1} < 1.

Convergence of the Entire Class
(3.1) Now we present a unified convergence theorem for the entire {L,R}-class. The proof is left to the next section after we develop some technical results. We note that the convergence interval (0, 2η) is member-dependent. It can also depend on the value of the parameter α > 0 in It is worth noting that the theorem only requires L −1 R, as a linear mapping in R n , to be non-expansive (plus a technical condition) rather than contractive. Convergence would not necessarily happen if one kept iterating on the primal variable x only. However, timely updating the multiplier y helps iterates for the pair (x, y) converge together.

Technical Results and Proof of Convergence
We first derive some useful technical lemmas. Let λ(τ ) be an eigenvalue of M(τ ), i.e., λ(τ ) ∈ σ (M(τ )). (4.1) The eigenvalue system corresponding to λ is where (u, v) ∈ C n ×C m is nonzero. For simplicity, we will often skip the τ -dependence of the eigenpair if no confusion arises. Proof We examine eigensystem (4.2). Rearranging the first equation of (4.2), we have Multiplying the first equation by τ B and adding to the second of (4.2), after rearranging we obtain Suppose that λ = 1. Then (4.5) implies Bu = 0. By definition (2.2), equation (4.4) reduces to Multiplying the above equation by u * and invoking Bu = 0, we arrive at u * Hu = u * Au = 0, contradicting to the assumption of the lemma. (λ, (u, v)) be an eigenpair of M(τ ) as is given in (4.2) where λ / ∈ {0, 1} and Bu = 0, then

Lemma 4.3 Let
(4.6) Proof It follows readily from (4.5) that Substituting the above into (4.4) and in view of (2.2), we have or after a rearrangement, Multiplying both sides of (4.8) by u * , we have Since u * B T Bu = 0, the above equation can be rewritten into Solving for the λ on the left-hand side of (4.9) while fixing the ones on the right, we obtain the desired result where the denominator term must be nonzero.

Lemma 4.4 Let τ, κ ∈ R and z
Moreover, τ = κ + (z) minimizes the above modulus so that Proof By direct calculation, from which both (4.10) and (4.11) follow. Now we are ready to prove Theorem 3.1.

Remarks
The {L,R}-class defined by (2.4) is constructed from splittings of the (1,1)-block of the saddle point system matrix that includes, but is not limited to, all known convergent splittings for positive definite matrices, offering adaptivity to problem structures with guaranteed convergence.
Those {L,R}-class members associated with block Gauss-Seidel splittings are natural extensions to the classic ADMM specialized to quadratic programs. In contrast to the existing general convergence theory for ADMM, Theorem 3.1 does not require separability, nor convexity in the entire space, and imposes no restriction on the number of blocks, while giving a Q-linear rate of convergence. It should be of great interest to extend these properties beyond quadratic programs, which will be a topic to be addressed in another work.
The convergence of certain members of the {L,R}-class has been studied in [7] under the assumption that L is symmetric positive definite. In [8], a special case corresponding to the SOR-splitting has been analyzed.