Krylov improvements of the Uzawa method for Stokes type operator matrices

The paper is devoted to Krylov type modifications of the Uzawa method on the operator level for the Stokes problem in order to accelerate convergence. First block preconditioners and their effect on convergence are studied. Then it is shown that a Krylov–Uzawa iteration produces superlinear convergence on smooth domains, and estimation is given on its speed.


Introduction
The Stokes problem: in bounded domains ⊂ R d is constantly in the focus of interest due to its basic role in fluid modelling, either on its own or as a building block for more elaborate models, and it also serves as a principal model for a saddle-point problem. Iterative methods for its solution have also been studied up to recently, see, e.g., [9,13,19,28] and the references there. To name two major approaches, Uzawa type methods or MINRES-based iterations are of great interest, and various linear convergence estimates have been proved. This paper is concerned with the Uzawa approach. In this context one often studies the iteration on a function space level for the operators themselves, see, e.g., [10,25,28], since this provides a more direct insight into the structure of the problem and also indicates the asymptotic behaviour of the iterations on proper sufficiently fine meshes. We also follow this vein. Furthermore, we address the Krylov type modification of the Uzawa method. Namely, in spite of the simplicity and elegance of the Uzawa method, which arises from a simple iteration applied to the Schur complement problem, it might be recommended to replace the simple iteration by the conjugate gradient (CG) algorithm to accelerate convergence [30].
Our paper considers first block preconditioners for operator matrices arising in the context of the Stokes problem, and studies their effect on convergence related to Krylov iterations. The need to use regularization is pointed out, and improvements are discussed for the Uzawa method, which is a basic defect-correction process, by use of a Krylov subspace iterative acceleration method. The discussion is given on a general operator level, which also covers the case of matrices arising from a finite element discretization.
Moreover, involving the conjugate gradient approach opens the way to study possible superlinear convergence, which is a typical phenomenon for Krylov methods in certain situations. Superlinear convergence of CG type methods has been derived for proper operators in Hilbert space, see, e.g., [23,31] and the authors' papers [6,7] which also cover certain elliptic operators. However, to the authors' knowledge, for the Stokes problem no superlinear result has been given yet. In the closing section of the paper we study an Uzawa-related iteration which comes from the CG method applied to the Schur complement problem on the operator level. We prove superlinear convergence in the case of smooth domains, furthermore, we give an estimate on its speed.
The paper starts with the properties of the Stokes problem and the involved iterations in Sects. 2 and 3, then the above-mentioned results are presented in Sects. 4 and 3.2, respectively.

The operator matrix
The Stokes problem can be written in an operator matrix form, moreover, it is a prototype of such saddle-point problems. Let us define the space and the operators L := − : where the Laplacian acts coordinatewise and includes zero Dirichlet boundary conditions. Then, as is well-known, the adjoint of B is B * = ∇, hence, changing the sign of the second row of (1.1), we can altogether rewrite it as the system It is also well-known that the linear operators L and B in the setting of (2.2) are bounded. In the sequel we will sometimes consider the general operator form (2.3) instead of (1.1), which also covers the special case when L and B are proper matrices (arising from a finite element discretization), see Section 4.

The Schur complement operator
The basis of both the well-posedness and certain iterative solutions of the problem are the well-known facts summarized below. By eliminating u, the Stokes system can be recast to the single equation where andf := div −1 f. Here both the operators ∇ and are understood in weak form, in fact L = − and ∇ = B * in accordance with (2.2). The operator S : L 2 0 ( ) → L 2 0 ( ) will be called 'Schur complement operator'. In fact, it plays the same role as the Schur complement matrix for saddle-point matrices in analogy with (1.1). The well-posedness of equation (2.4) is due to the fact that S is a bounded coercive self-adjoint operator: in fact, where γ > 0 is the inf-sup constant on . For a further discussion on inf-sup constants, see subsection 4.1.2. The eigenvalues of the Schur complement operator have been studied in various situations [18,29,32]. The definition (2.5) implies that an equation Sp = λ p can be reformulated as − u + ∇ p = 0 div u = λ p together with the Dirichlet boundary conditions. Eliminating p yields −λ u + ∇ div u = 0, that is, the problem can thus be related to the Cosserat spectrum [29].
Later we will need a result from [18]. Therein the action of S is described on the following subspaces of L 2 0 ( ):

Uzawa and Krylov-Uzawa iterations
Uzawa iteration A popular way to solve the Stokes problem is the Uzawa iteration, which is still in the focus of recent interest, see, e.g., [16,20,24]. On the continuous level and in strong form, it reads as starting from an arbitrary p 0 ∈ L 2 0 ( ). This iteration is equivalent to a simple iteration applied to equation (2.4) in L 2 0 ( ). Namely, the sequence can be rewritten with the auxiliary variable u k+1 := −1 (∇ p k − f) to yield (3.1). Consequently, due to the spectral bounds in (2.6), the optimal parameter is α := 2 1+γ 2 and the corresponding optimal convergence factor is 1−γ 2 1+γ 2 , where γ > 0 is the inf-sup constant on . For further solid details on the Uzawa method, see, e.g., [10,28].
Krylov-Uzawa iteration In spite of the simplicity and elegance of the Uzawa method, if γ is small then it may be recommended to replace the simple iteration by the CG algorithm for (2.4), see [30], and also [1,12,27] for similar ideas. That is, for arbitrary p 0 ∈ L 2 0 ( ), we let d 0 = r 0 := Sp 0 −f , and if p k , r k and d k are found, then We then obtain the Krylov-Uzawa iteration by rewriting (3.2)-(3.3) in the same manner as above in the Uzawa case, using again the auxiliary variable u k+1 := −1 (∇ p k − f) and also introducing z k := −1 ∇d k . Thus, given p 0 ∈ L 2 ( ), we first let d 0 = r 0 = div u 0 where u 0 solves − u 0 + ∇ p 0 = f, u 0 |∂ = 0. Further, if p k , u k , r k and d k are found, then the next iterates are determined as follows: The constants α k and β k are as in (3.2)-(3.3), except that Sd k in α k is replaced by div z k .
Compared to the Uzawa iteration, the main cost of a Krylov-Uzawa step is similarly the solution of an auxiliary Poisson equation, but additional work is needed to compute the search directions and the inner products in α k and β k . On the other hand, the convergence factor of linear iteration is now 1−γ 1+γ . Moreover, in contrast to a simple iteration, superlinear convergence may arise in Krylov iterations under proper circumstances.
The last section of this paper will be devoted to superlinear convergence estimates in Krylov-Uzawa context.
We note that the actual realizations of Uzawa and Krylov-Uzawa iterations involve approximate solution of the Poisson equations, mostly using inner iterations, which has to be executed to high accuracy to control the overall error. In particular, realizing superlinear convergence needs an increasing inner accuracy. This situation is similar to inexact Newton methods for nonlinear problems.

Superlinear convergence of Krylov iterations
In this section we briefly summarize some well-known facts on the conjugate gradient (CG) algorithm. (For detailed discussions on CG methods, see e.g. [2].) The basic properties hold not only in R n but also on a Hilbert space level (see also [31]), which we address in this paper. Therefore, let us consider a bounded linear operator A : H → H in a Hilbert space H , and a linear equation for a given b ∈ H . We assume that A is self-adjoint and coercive, hence (3.5) is well-posed.
The CG iteration has been recalled in (3.3) for equation Sp =f , now we consider its analogue for (3.5). In the study of convergence, one considers the error vector and is in general interested in its energy norm The convergence estimates rely on the optimality property In particular, if A has a complete eigensystem (which alway holds in the finite dimensional case), then (3.8) implies where λ i (i ∈ N) are the eigenvalues of A. This is the basis of superlinear estimates.

Compact perturbations of the identity
The well-known superlinear convergence rates for the CG can be proved for compact perturbations of the identity, that is, if the decomposition holds, where I is the identity and E is a compact operator in H . We recall the derivation for use below, based on [2]. Note that this was established on Hilbert space level in [31]. Since E is also self-adjoint, A has a complete eigensystem (the same as E). Let us denote the eigenvalues of A and E by λ j (A) and μ j (E), or briefly, just by λ j and μ j , respectively, ordered according to (3.11) Using (3.9) and the arithmetic-geometric means inequality, we finally find that Here μ j (E) → 0 and hence its arithmetic means 1 k k j=1 μ j (E) → 0 as well, that is, we obtain a superlinear convergence rate.

Compact perturbations of algebraic operators
Now we extend the above superlinear estimate for decompositions where the identity operator in (3.10) is replaced by an algebraic operator. A related qualitative result has been given in [21], however, we need an estimation in a quantitative form. We derive a new result analogous to (3.12).
An algebraic operator J is the root of a nonzero polynomial, and in this case the space H is the direct sum of subspaces ker(J − σ I ) r . For us it suffices to assume that σ are single roots, i.e. all r = 1 and σ are eigenvalues. That is, we can formulate the following result. Then the error vectors of the CG algorithm for (3.5) satisfy for some constant C > 0 independent of k.
Proof The assumptions imply that where E := E |H : H → H are compact self-adjoint ( = 1, ..., s), further, that the eigenvalues of A are Here for all fixed we have μ j (E ) → 0 as j → ∞. The operators A |H = σ I |H + E have complete eigensystems in H for all fixed = 1, . . . , s, hence, using their union, A has a complete eigensystem in H . Therefore we can use (3.9) to estimate the convergence factor. More precisely, to follow the double index notation (3.15), we write Now, for given k ∈ N, let us write it as k = k 1 + k 2 + . . . k s and choose the polynomial Here the definition of P * k (λ) implies that Let us fix some λ i and estimate the above product. As m = 1, ..., s, we only leave the terms for m = and estimate the other terms above by 1. Thus we obtain Here Moreover, since j → |μ j (E )| is a nonincreasing sequence and j ≤ k < k +1 ≤ i, we have |μ i (E ))| ≤ |μ j (E ))|, hence, with the above relation, On the other hand, we have 1 λ j ≤ A −1 since A is a positive operator with a complete eigensystem. Altogether, Using the arithmetic-geometric means inequality, we find that Now we can use the above estimate for particular quasi-uniform decompositions, that is, for k = k 1 + k 2 + . . . k s in which all k are chosen to satisfy A special situation occurs if there is only one nonzero operator among the compact perturbations. Then the above proof can be modified to reproduce the result in the same form as (3.12): Proof We modify the proof of Theorem 3.1 from (3.17). Instead of a quasi-uniform decomposition k = k 1 + k 2 + . . . k s , we simply choose k 1 = k, and k = 0 for ≥ 2.

Block preconditioning and Krylov acceleration
In this section we consider block preconditioners and their effect on convergence in the context of Krylov iterations. It can be seen as a survey of earlier, with some new, results. In Sect. 4.1 we address the need to use regularization. We discuss also the improvements of the Uzawa method, which is a basic defect-correction method, by use of a Krylov subspace iterative acceleration method. This is discussed in detail for Stokes type problems in Sect. 4.2 and 4.3. Our discussion is mostly on a general operator level, which also covers the situation when the operators are matrices arising from a finite element discretization.

Motivation from optimization problems
As is familiar, constrained optimization problems arise in a multitude of applications. Such problems involve at least two variables, one of which is the Lagrange multiplier needed to impose the constraint. In the context of mixed variable or constrained finite element methods, the Lagrange multiplier acts as a natural physical variable such as the pressure in Stokes problem. The arising systems are normally symmetric and indefinite, but have a block matrix structure which can be utilized. Various types of preconditioners for such systems have been proposed through the years. For reduced matrix approaches, a Schur complement system for one of the variables must be solved. Lu, u − u, f + Bu − g, p (4.1) which leads to a linear system similar to (2.3), now allowing g = 0: where B is surjective (in matrix case, it has full row rank) and L is symmetric. The block L can be indefinite in general, but we assume that L is positive definite on ker(B), which implies that for some sufficiently small ε 0 , the operator εL + B * B is positive definite for all 0 < ε < ε 0 . In this case the optimization problem (4.1) is augmented by the addition of a regularization term Bu − g 2 /2, which leads to the augmented system Since Bu = g, this is equivalent to (4.2) in the sense that both systems have the same solution. This method is particularly useful if L is indefinite or ill-conditioned. In particular, since L + (1/ε)B * B is positive definite, it follows readily that a solution exists. For such systems, the Schur complement S = B L −1 B * for (4.2) is replaced by S = B(L + 1 ε B * B) −1 B * . We also note that there exist weighted versions of such regularization, see, e.g., [5]. Based on the actual form of B and L one can estimate the value of ε 0 , see e.g. [3].
We discuss first the LBB stability condition and then the solution of the regularized problem.

An algebraic form of the LBB condition
The optimization problems involve partial differential equations that for the actual solution must be discretized, which is normally done by use of a finite element method. To analyse the stability of discretized saddle point problems the LBB inf-sup condition is mostly used.
The LBB condition can be presented in algebraic form as follows. Assume then first that the LBB-condition holds for the L 2 -norm. Here the discrete gradient matrix B * has a one-dimensional nullspace consisting of constant vectors, hence we consider the Schur complement for vectors in the orthogonal complement of this nullspace. In that subspace B has full rank.
Recall that the Moore-Penrose generalized inverse of B equals B † = B * (B B * ) −1 , in which case B B † = I . The algebraic form of the LBB condition for the L 2 -norm is Here, given p, the sup is taken for u = B † p (or its scalar multiple), for which Bu = p.
In the matrix case, since More generally, we obtain for the L-norm of u and an R-norm of the second saddle point variable p, where L and R are symmetric positive definite, For the Stokes problem we have R = I 2 and L equals the block Laplacian,that is, the L-norm equals the first-order Sobolev norm for the corresponding function space. As seen from the above, γ > 0 (owing to considering the orthogonal complement of the one-dimensional nullspace of B * ); however, γ can be small. Then one must regularize the problem, writing it in the form Here C is symmetric and positive semidefinite, and it is positive definite on the null space ker(B * ) of B * . For Stokes problem one can let it be diagonal. For the corresponding Schur complement the lower spectral bound is positive: in the matrix case, For the LBB-condition to hold in finite element problems one must use stable element pairs for the two variables, which implies that for the Stokes problem, the space V for velocity must be sufficiently richer than the space W for pressure.
On the other hand, when finite element methods are implemented on parallel computer platforms, the data communication is simplified if one uses equal order finite elements for V and W . To stabilize for this, or to increase the stabilization even when one uses some stable element pairs, but where the constant γ is too small (such as for the mini-element, see [4]), one must use a stabilization term, −αC, for some scalar α, depending on the actual form of B and L, see e.g. [3]. As has been shown in [4] for Stokes problem α = O(h), where h is the discretization parameter.
We note that the addition of the stabilization term can be done without any 'overstabilization' effects, such as loss of order of the approximate solution, as the right-hand side is correspondingly modified, see [4] for further details.

The Uzawa method in the context of regularization
We now return to the first regularized form (4.3), and show its relation with the Uzawa method. First we need the following technical result: Proof Use a direct computation or Sherman-Morrison-Woodbury formula.
The Uzawa method can be presented in the following way. Let be the Lagrangian functional corresponding to the regularized constrained optimization problem for a given ε > 0. The solutionû ∈ V = R n ,p ∈ X = R m is a saddle point of ε (u, p) and, given p, we have For this functional the first order derivative satisfies ∇ϕ( p) = Bû( p) − g and the Hessian is Since the Hessian matrix is negative definite on the range of B, hence ϕ( p) has a minimum at p and when ε 1, it follows from Lemma 4.1 that ∇ 2 ϕ( p) ≈ −ε I . Using the approximation and the above-mentioned relation ∇ 2 ϕ( p) ≈ −ε I for the initial guesses, the method takes the simple form It is readily seen that this method is equivalent to a stabilized block Gauss-Seidel iteration method, namely, given p 0 , for k = 0, 1, . . ., until convergence, let In practice, solutions of systems with the matrix L + (1/ε)B * B can be costly. Therefore, we consider next a generalized block Gauss-Seidel matrix, used as a preconditioner in an iterative solution method, where the arising linear systems are easier to solve.

Block Gauss-Seidel preconditioner for a basic Krylov acceleration method
We consider now preconditioners for the unreduced system (4.2). The considered 2 by 2 block matrices may contain operator blocks in general. Let D 1 , D 2 be symmetric positive definite preconditioners to L and B D −1 1 B * , respectively. Convergent choices are discussed at the end of this subsection. Then can be used as a simple, but still efficient preconditioner on block triangular form to L B * B 0 . The operator matrix (4.5) corresponds to the matrix splitting For the analysis of this preconditioner we must analyse the eigenvalues (λ) of which equal λ = 1 + δ, where δ denotes the eigenvalues of the preconditioned residual matrix, i.e.
Using a similarity transformation with the matrix D A computation shows that If u = 0 then it follows that p = 0. Therefore, u = 0 for any eigenvector of (4.7) and a multiplication of the second equation in (4.8) yields This result is collected in the next theorem.
It follows that if D 1 = L then a = 0 and δ = b 2 − 1 0 . Hence the eigenvalues of (4.6) are real and equal the unit number (in the matrix case with multiplicity at least n) respectively to the eigenvalues of B B * , i.e. of D −1 2 B D −1 1 B * , which latter are positive. By choosing D 2 sufficiently close to B D −1 1 B * we can hence cluster the eigenvalues around the unit number. If D 1 is diagonal, then we can form the matrix B D −1 1 B * explicitly. This holds also if |a| is small. However, if b is small (which occurs for a nearly rank-deficient matrix B) then δ can take values close to a and −1, which means that the preconditioned matrix is nearly singular. Further, it can be seen that if −(b + 1) 2 < a < −(b − 1) 2 , then the eigenvalues are complex.
It can hence be concluded that the above preconditioning method does not have a fully robust behaviour.
Even D 1 or D 2 is itself a differential operator, its discretization can be solved by inner iterations to some limited accuracy. As shown e.g. [4,8], conjugate gradient Krylov subspace methods are quite insensitive to the inner iteration tolerance. As shown e.g. in [30], the pure Uzawa method is more sensitive.
Depending on the expense in solving systems with D 1 and D 2 , this method can work quite efficiently. However, due to the strong unsymmetry of the iteration matrix, the norm of the corresponding matrix in (4.7) can be large and this must be compensated for by more iterations. Therefore we turn now to a symmterizable form of preconditioned matrix.

A symmetrized form of a block Gauss-Seidel preconditioner
In [8], a one-sided symmetrization of a block Gauss-Seidel preconditioned matrix was proposed. Therein, the method was formulated as a block Gauss-Seidel method, which was subsequently symmetrized by choosing a proper implicit inner product (cf. [5,14,15,26] for a similar approach). We show below that the method can be formulated algebraically, using a modification of the Gauss-Seidel matrix. Let then L 0 be a symmetric positive definite preconditioner to L which has been scaled to satisfy L 0 u, u ≤ 1 Lu, u for all u ∈ R n , where 0 < α 1 < 1. After applying a block Gauss-Seidel preconditioner in (inverse) multiplicative form, the preconditioned matrix becomes This matrix is not only symmetric but also positive definite. The latter is readily seen if we first rewrite it in the form , whose Schur complement is seen to be C + B L −1 B * , which by assumption is positive definite.
The advantages with the above method are that, contrary to the standard Gauss-Seidel preconditioner, one can use a classical preconditioned conjugate gradient method for symmetric positive definite problems and that, contrary to the Schur complement method, no inner iterations are required (they have been replaced by one action of L −1 0 and one of L per iteration step). The disadvantages are that the rate of convergence depends on the condition number of both matrices L −1/2 0 L L −1/2 0 − I 1 and C + B L −1 B * . It requires therefore a particularly accurate preconditioner L 0 and is less robust when B is nearly rank-deficient. Ill-conditioning of the second matrix can to some extent be handled by a proper choice of a regularization matrix C, see [3] for further details.

Superlinear convergence of the Krylov-Uzawa iteration for the Stokes problem
As mentioned earlier, Krylov subspace methods typically lead to a steadily reduced spectral condition number, which, under proper circumstances, may result in a superlinear increase of the rate of convergence. In this section we show that the Krylov-Uzawa iteration (3.4) may indeed exhibit superlinear convergence on the operator level.
To our knowledge, no superlinear result has been proved before in the context of the Stokes problem. The result relies on a technical background that exploits a smoothness assumption on the boundary. Namely, following Theorem 2.1, we assume that is connected and its boundary is in class C 3 . The mentioned background shows that one in fact cannot expect superlinear convergence for non-smooth domains, see Remark 5.1 below.
The starting point to the required estimates is Theorem 3.1, which we have proved in subsect. 3.2.2. In order to apply this result, we first have to derive a quantitative measure of the compactness of the operator K that appears in Theorem 2.1. In fact, the sums of eigenvalues behave as the sums for sequences O( j − 2 d ) (where d is the dimension).

Theorem 5.1
If ∂ ∈ C 3 , then the ordered eigenvalues μ j (K ) of the compact operator K in Theorem 2.1 satisfy whenever μ j (K ) > 0. Here K H 1 is no more necessarily self-adjoint in the H 1 -inner product, hence it is better characterized by its singular values where H i−1 stands for an arbitrary (i − 1)-dimensional subspace, see [ The latter mimimax values are the approximation numbers of the Sobolev embedding of H 1 ( ) into L 2 ( ), which are inversely related to those of the Neumann Laplacian eigenvalues on , known to be O( j 2/d ) (where d is the dimension), due to the classical result of Weyl [11]. This yields Further, [22,Cor. 2.4] implies that hence with the above, for all j ∈ N + , where the latter follows with elementary analysis by comparing with j 1 x −2/d dx, see also [7,Prop. 4.4]. Thus we have obtained the desired estimate. For instance, on a ball in 3D one has μ j (K ) = − 1 2(2 j+1) , and on a disk in 2D the operator K is simply the zero operator [18,32]. (ii) The smoothness of the boundary is important in Theorem 2.1 and hence also in Theorem 5.1. For instance, a polygonal corner even contributes an interval to the essential spectrum of the Cosserat operator [17], which implies that one cannot obtain superlinear convergence estimates in that case.
Based on the previous results, we can now derive the our main theorem on the superlinear convergence for the Stokes problem.