Superior properties of the PRESB preconditioner for operators on two-by-two block form with square blocks

Matrices or operators in two-by-two block form with square blocks arise in numerous important applications, such as in optimal control problems for PDEs. The problems are normally of very large scale so iterative solution methods must be used. Thereby the choice of an efficient and robust preconditioner is of crucial importance. Since some time a very efficient preconditioner, the preconditioned square block, PRESB method has been used by the authors and coauthors in various applications, in particular for optimal control problems for PDEs. It has been shown to have excellent properties, such as a very fast and robust rate of convergence that outperforms other methods. In this paper the fundamental and most important properties of the method are stressed and presented with new and extended proofs. Under certain conditions, the condition number of the preconditioned matrix is bounded by 2 or even smaller. Furthermore, under certain assumptions the rate of convergence is superlinear.


Introduction
Iterative solution methods are widely used for the solution of linear and linearized systems of equations. For early references, see [1][2][3]. A key aspect is then to use a proper preconditioning, that is a matrix that approximates the given matrix accurately but is still much cheaper to solve systems with and which results in tight eigenvalue bounds of the preconditioned matrix, see e.g. [4][5][6]. This should hold irrespective of the dimension of the system and thus allow a fast large scale modelling. Thereby preconditioners that exploit matrix structures can have considerate advantage.
Differential operators or matrices on coupled two-by-two block form with square blocks, or which have been reduced to such a form from a more general block form, arise in various applications. The simplest example is a complex valued system, where A, B, x, y, f and g are real valued, which in order to avoid complex arithmetics, is rewritten in the real valued form, that is, where no complex arithmetics is needed for its solution. For examples of use of iterative solution methods in this context, see e.g. [7][8][9][10].
As we shall see, much more important examples arise for instance when solving optimal control problems for partial differential equations. After discretization of the operators, matrices of normally very large scale arise which implies that iterative solution methods must be used with a proper preconditioner.
The methods used are frequently of a coupled, inner-outer iteration type which, since the inner systems are normally solved with variable accuracy, implies that a variable iteration outer acceleration method such as in [11], or the flexible GMRES method [12] must be used. However, as we shall see, for many applications sharp eigenvalue bounds for the preconditioned operator can be derived, which are only influenced to a minor extent by the inner solver so one can then even use a Chebyshev iterative acceleration method. This implies that there are no global inner products to be computed which can save much computer time since computations of such inner products are mostly costly in data communication and other overhead, in particular when the method is implemented on parallel computers.
During the years numerous preconditioners of various types have been constructed. For instance, in a Google Scholar search of a class of matrices based on Hermitian or Skew Hermitian splittings, one encounters over 10,000 published items. Some of them have been tested, analysed and compared in [13]. It was found that the square block matrix, PRESB preconditioning method has superior properties compared to them and also to most other methods. It is most robust, it leads to a small condition number of the preconditioned matrix which holds uniformly with respect to both problem and method parameters, and sharp eigenvalue bounds can be derived. The methods can be seen as a further development of an early method used in [14], and also of the method in [15]. The method has been applied earlier for the solution of more involved problems, see e.g. [16][17][18]. We consider here only methods which can be reduced to a form with square blocks. Some illustrative examples of optimal control of parabolic problems with time-harmonic control can be found in [19][20][21][22].
In this paper we present the major properties of the PRESB preconditioner on operator level, with short derivations. This includes presentation of a typical class of optimal control problems in Sect. 3 with an efficient implementation of the method, derivations of spectral properties with sharp eigenvalue bounds in Sect. 4 an inner product free implementation of the method in Sect. 5 and conditions for a superlinear rate of convergence properties in Sect. 6.
To shorten the presentation, we use the shorthands r.h.s and w.r.t. for "right hand side" and "with respect to", respectively. The shorthands for symmetric and positive definite and symmetric and positive semidefinite are denoted spd and spsd, respectively. The nullspace of an operator A is denoted N (A).

A basic class of optimal control problems
For various iterative solution methods used for optimal control problems, see [23][24][25][26][27][28][29][30][31][32][33][34][35]. For a comparison of PRESB with some of the methods referred to above, see [13]. Some methods are based on the saddle point structure of the arising system and use the MINRES method [28,36] as acceleration method, see e.g. [37][38][39][40]. Other methods use the GMRES method as acceleration method [6,12]. In this paper we present methods based on the PRESB preconditioner. This method has been used for optimal control problems, see e.g. [13,19,21]. For other preconditioning methods used for optimal control problems, see [41][42][43][44][45]. For comparisons with some of the other methods referred to above, see [7,13,46]. A particularly important class of problems concern inverse problems, where an optimal control framework can be used. Examples include parameter estimation [47] and finding inaccessible boundary conditions [48], where a PRESB type preconditioner has been used.
As an illustration, we consider a time-independent control problem, first using H 1 -regularization and then the L 2 -regularization, with control function u and target solution y as described in [49], see also [46,50] for more details.
For the H 1 -regularization, let ⊂ R d be a bounded connected domain, such that an observation region 1 and a control region 2 are given subsets of . It is assumed that 1 ∩ 2 is nonempty. The problem is to minimize subject to a PDE constraint Ly = f with given boundary conditions, where where c is differentable and d − 1 2 ∇ · c ≥ 0. Here the fixed boundary term g admits a Dirichlet liftg ∈ H 1 ( ), and β > 0 is a proper regularization constant. For notational simplicity we assume now that c = 0 and d = 0. Then the corresponding Lagrange functional takes the form where y ∈g + H 1 0 ( ), u ∈ H 1 ( 2 ) and λ is the Lagrange multiplier, whose inf-sup solution equals the solution of (2.1), (2.2). (In the following we delete the integral incremental factor d .) The stationary solution of the minimization problem, i.e. where ∇L(y, u, λ) = 0, fulfils the following system of PDEs in weak form for the state and control variables and for the Lagrange multiplier: Using the splitting y = y 0 +g where y 0 ∈ H 1 0 ( ) the system can be homogenized. In what follows, we may therefore assume that g = 0, and hence y ∈ H 1 0 ( ). We consider a finite element discretization of problem (2.3) in a standard way. Let us introduce suitable finite element subspaces and replace the solution and test functions in (2.3) with functions in the above subspaces. We fix given bases in the subspaces, and denote by y, u and λ the corresponding coefficient vectors of the finite element solutions. This leads to a system of equations in the following form: where M 1 and M 2 are the mass matrices used to approximate y and u, i.e. corresponding to the subdomains 1 and 2 . In the same way, K and K 2 are the stiffness matrices corresponding to and 2 , respectively, and the rectangular mass matrix M corresponds to function pairs from × 2 . Here λ and y have the same dimension, as they both represent functions on , whereas u only corresponds to nodepoints in 2 . We also note that the last r.h.s is 0 due to g = 0. In the general case where g = 0 we would have some g = 0 in the last r.h.s, i.e. non-homogenity would only affect the r.h.s. and our results would remain valid. Problem (2.3), as well as system (2.4) has a unique solution. Properly rearranging the equations, we obtain the matrix form We note that M 2 + K 2 is symmetric and positive definite so we can eliminate the control variable u in (2.5): Hence we are lead to a reduced system in a two-by-two block form: Here one introduces the scaled vectorλ := 1 √ β λ and multiplies the second equation in (2.6) with − 1 √ β . Using the notation For this method we assume that K is spd. Similarly, after reordering and change of sign we obtain that is, In this method K can be nonsymmetric in which case the matrix block in position (1, 2) is replaced by K .
For the L 2 -regularization method, where the term 1 2 β u 2 H 1 ( ) is replaced by where M 0 = MM −1 2 M T . Our aim is to construct an efficient preconditioned iterative solution method for this linear system and to derive its spectral properties and mesh independent superlinear convergence rate.

Construction and implementational details of the PRESB preconditioner
Consider an operator or matrix in a general block form, where A and the symmetric parts of B and C are spsd and the nullspaces N (A) and N (B) and N (A) and N (C) are disjoint. Hence A + B and A + C are nonsingular.
If B = C, a common solution method (see e.g. [40]) is based on the block diagonal matrix, A spectral analysis shows that the eigenvalues of . This preconditioning method can be accelerated by the familiar MINRES method [36]. Due to the symmetry of the spectrum, its convergence can be based on the square of the optimal polynomial for the interval [ 1 which has spectral condition number √ 2 and corresponds to a convergence factor (2 1/4 − 1) (2 1/4 + 1) 1 12 . But note that the indefiniteness of the spectrum requires a double computational effort compared to the single interval.
To avoid the indefinite spectrum and enable use of the GMRES method as acceleration method we now consider the following, PRESB preconditioner Its spectral properties will be shown in the next section.
In particular, when B = C, the matrix P A simply becomes In the case of the system matrix (2.7) of the control problem, the PRESB preconditioner has the form P (1) We show now that there exists an efficient implementation of the preconditioner (3.2). It can be factorized as Hence its inverse equals Therefore, besides some vector operations and a operator or matrix vector multiplication with B, an action of the inverse involves a solution with operator or matrix A + B and one with A + C. In some applications A is symmetric and positive definite and the symmetric parts of B, C are also positive definite, which can enable particularly efficient solutions of these inner systems. The above forms have appeared earlier in [13].

Remark 3.1 A system with P A ,
can alternatively be solved via its Schur complement system as Clearly one can also use S as a preconditioner to the exact Schur complement S = A + B A −1 C for A, which gives the same spectral bounds as the PRESB method. For further information about use of approximations of Schur complements, see [5,23]. However, this method requires the stronger property that A is nonsingular, and besides solutions with A + B and A + C, it involves also a solution with A to obtain the corresponding iterative residual. In addition, when the solution vector x has been found, it needs one more solution with matrix A to find vector y. Furthermore, in many important applications A is singular. Therefore the method based on Schur complements is less competitive with a direct application of (3.5).

Spectral analysis based on a general form of the preconditioning matrix
Consider matrix A, of order 2n × 2n and its preconditioner P A in (3.1) and (3.2). Here we change the sign of the second row. To find the spectral properties of P −1 A A, consider the generalized eigenvalue problem It follows that λ = 1 for eigenvectors (x, y) such that {x ∈ N (B + C), y ∈ C n arbitrary}. Hence, the dimension of the eigenvector space corresponding to the unit eigenvalue λ = 1 is n + n 0 , where n 0 is the dimension of the nontrivial nullspace of B + C.
An addition of the equations in (4.1) shows that and hence, from the first equation in (4.1), it follows which can be rewritten as where μ is an eigenvalue of the generalized eigenvalue problem Hence, We extend now this proposition to the case of complex eigenvalues μ but still under the condition that B = C.

Proposition 4.2 Let A be spsd, B = C and let the eigenvalues of μ(
that is, the eigenvalues are contained in a circle around unity with radius < 1. Proof It follows from (4.5) that For small values of the imaginary part η, the above bound becomes close to the bounds found in Proposition 4.1.

Spectrum for complex conjugate matrices where C = B *
Consider now the matrix in (3.1) where C = B * , i.e. it can be complex-valued. This statement has already been shown in [19] but with a slightly different proof.

Proposition 4.3 Let A be spd, B + B * positive semidefinite and assume that B is related to
Proof It follows from (4.5) that where x * denotes the complex conjugate vector.
The above shows that the relative size, Re(μ)/|μ| of the real part of the spectrum of B = A −1/2 B A −1/2 determines the lower eigenvalue bound of P −1 A A and, hence, the rate of convergence of the preconditioned iterative solution method. For a small such relative part the convergence of the iterative solution method will be exceptionally rapid. As we will show later, such small parts can occur for time-harmonic problems with a large value of the angular frequency.
We present now a proof of rate of convergence under the weaker assumption that A is spsd.

Proposition 4.4 Let A and B
Proof The generalized eigenvalue problem takes here the form Hence and it follows from (4.4) that Clearly, any vector x ∈ N (B+B * ) corresponds to an eigenvalue λ = 1. It follows from Hence, λ = 1 in this case also. To estimate the eigenvalues λ = 1, we can consider subspaces orthogonal to the space for which λ = 1. We denote the corresponding inverse of A as a generalized inverse, A † . It holds then

Spectral properties of the preconditioned matrix, P (1) h for the basic optimal control problem
We recall that the preconditioner P (1) h is applicable only if K is spd.
To find the spectral properties of the preconditioned matrix P can use an intermediate matrix, and first find the spectral values for B −1 P (1) h and then for this gives the wanted properties. Let then μ denote an eigenvalue of the generalized eigenvalue problem, Here We note that if ξ = 0, then η = 0, since K is spd. Since ξ + η ∈ N ( M 1 − M 0 ) ⊥ , it follows then that both ξ = 0 and η = 0 and Hence μ is contained in an interval bounded independently of the parameters h and β.
Consider now the eigenvalue problem The second row yields again M 1 ξ = Kη. Substituting this in the first equation, leads to Taking the inner product with η, and using (Kξ then we readily obtain: where θ min and θ max are defined in (4.7).
In order to study the uniform behaviour of θ min and θ max as β → 0, note that the definition of M 1 and M 0 implies More precisely, we can make the estimate as follows. We have On the other hand, the previously seen equality M 1 ξ = Kη implies that Kη has zero coordinates where M 1 ξ has, i.e. in the nodes outside 1 is bounded below uniformly in β. Hence, altogether, θ min , θ max and ultimately the spectrum of P −1 h A h are bounded uniformly w.r.t β ≤ c h 4 .

Spectral analyses for the preconditioner P (2)
h The analyses of the preconditioning matrix C = P (2) h in (2.9) of A = A (2) h will take place in two steps. We introduce then an intermediate matrix B for which the preconditioning of C follows from Sect. 4.1. We assume here that the observation domain is a subset of the control domain.
Hence P (2) h = BB −1 C will be considered as the preconditioner to A and using the already described eigenvalue bounds for B −1 C, we only have to derive eigenvalue bounds for B −1 A. Let then where M is a weighted average, Note that since 0 ⊂ 1 , E is symmetric and positive semidefinite. Hence from Here We note that the upper bound in (4.9) is taken for ξ = 0. Then it follows from (4.8) that K T η = 0. Hence Hence the spectral condition number of B −1 A is bounded by As we have seen, it holds that the condition number of Since γ 0 and γ 1 are not known in general a proper value of the parameter α can be α = 1/2. Then However, if γ 0 is small, but γ 1 sufficiently larger than unity, then it is better to let α = 1 − ε, where ε is small. Then On the other hand, if γ 0 is large, that is if the observation domain 0 nearly equals the control domain, we note that γ 0 → ∞ and ε). In fact, if M 0 = M 1 , then E = 0, and we can let α = 0 i.e. M = M 0 = M 1 . In all cases, the considered bounds hold uniformly with respect to regularization parameter β and in principle also w.r.t. the mesh parameter h.

Remark 4.1
Other well-known preconditioning strategies for general two-by-two block matrices, such as block-triangular preconditioners, are also applicable, cf., e.g. [24,55,56]. We do not discuss them here any further. Although robust with respect to the involved parameters, in [7,13,46,50] some of them have been shown to be computationally less efficient than PRESB on a benchmark suite of problems. The PRESB preconditioning method is not only fastest in general, but also more robust. Its convergence factor is bounded by nearly 1/6 which shows that after just 8 iterations, the norm of the residual has decreased by a factor of about 0.5 · 10 −6 . Moreover, it is even somewhat faster due to the superlinear convergence to be discussed in Sect. 6.

Inner-outer iterations
The use of inner iterations to some limited accuracy perturbs the eigenvalue bounds for the outer iteration method. As pointed out in [51], see also [5], one must then in general stabilize the Krylov iteration method. However, it has been found that for the applications we are concerned with the perturbations are quite small and, even if they can give rise to complex eigenvalues, one can ignore them as the outer iterations are hardly influenced by them.

Inner product free methods
Krylov subspace type acceleration methods require computations of global inner products, which can be costly, in particular in parallel computer environments, where the inner products need global communication of data and start up times. It can therefore be of interest to consider iterative solution methods where there is no need to compute such global inner products. Such methods have been considered in [52] but here we present a shorter proof and some new contributions.
As we have seen, the PRESB method results mostly in sharp eigenvalue bounds. This implies that it can be very efficient to use a Chebyshev polynomial based acceleration method instead of a Krylov based method, since in this method there arise no global inner products. As shown e.g. in [52,57], the method takes the form presented in the next section. Numerical tests in [52,58] show that it can outperform other methods even on sequential processors.

A modified Chebyshev iteration method
Given eigenvalue bounds [a, b], the Chebyshev iteration method, see e.g. [1][2][3][4][5] can be defined by the recursion For problems with outlier eigenvalues one can first eliminate, i.e. 'kill' them, here illustrated for the maximal eigenvalue, by use of a corrected right hand side vector, The so reduced right hand side vector equals and one solves by use of the Chebyshev method for the remaining eigenvalue bounds. Then one can compute the full solution, However, due to rounding and small errors in the approximate eigenvalues used, the Chebyshev method makes the dominating eigenvalue component 'awake' again, so only very few steps should be taken. This can be compensated for by repetition of the iteration method, but then for the new residual. The resulting Algorithm is: Algorithm Reduced condition number Chebyshev method: For a current approximate solution vector x, until convergence, do: Solve B −1 Ax = q, by the Chebyshev method with reduced condition number. 5. Compute x =x + 1 λ max q 6. Repeat In some problems a large number of outlier eigenvalues larger than unity appear. Normally they are well separated. One can then add the to the unit value closer ones to the interval [1/2, 1], to form a new interval [1/2, λ 0 ], where λ 0 > 1 but not very large and let the remaining eigenvalues, say [λ 1 , λ max ] form a separate interval. After scaling the intervals one get then two intervals, for which a polynomial preconditioner with the polynomial λ(2 − λ) can be used. It is also possible to use a combination of the Chebyshev and Krylov method, that is start with a Chebyshev iteration step and continue with a Krylov iteration method. This has the advantage that the eigenvalues can be better clustered after the first Chebyshev iteration step, so the Krylov iteration method will converge superlinearly fast from the start.
If the eigenvalues of the preconditioned matrix are contained in the interval [ 1 2 , 1], we use then a corresponding polynomial preconditioner, Let μ be the eigenvalues of P(B −1 A). Then μ(λ) = λ(3−2λ) so min μ(λ) = μ( 1 2 ) = μ(1) = 1 and max λ μ(λ) = 9 8 , which is taken for λ = 3/4. Hence the convergence rate factor for a corresponding Krylov subspace iteration method (see e.g. [3]) becomes bounded above by which leads to a very fast convergence and which is further improved by the effect of clustering of the eigenvalues.

Superlinear rate of convergence for the preconditioned control problem
As we have seen, the condition number can be small but not in all applications. Even if it is small it can be of interest to examine the appearance of a superlinear rate of convergence. Under certain conditions one observes a superlinear rate of convergence of the preconditioned GMRES method. Below we first recall well-known general conditions for the occurrence of this, and then derive this property in applications for control problems. For some early references on superlinear rate of convergence, see [59][60][61]69] and the authors' papers [66,70].

Preliminaries: superlinear convergence estimates of the GMRES method
Consider a general linear system with a given nonsingular matrix A ∈ R n×n . A Krylov type iterative method typically shows a first phase of linear convergence and then gradually exhibits a second phase of superlinear convergence [5]. When the singular values properly cluster around 1, the superlinear behaviour can be characteristic for nearly the whole iteration. We recall some known estimates of superlinear convergence, also valid for an invertible operator A in a Hilbert space. When A is symmetric positive-definite, a well-known superlinear estimate of the standard conjugate gradient, CG method is as follows, see e.g. [5]. Let us assume that the decomposition holds, where I is the identity matrix. Let λ j (E) denote the jth eigenvalue of E in decreasing order. Then In our case the matrix is nonsymmetric, for which also several Krylov algorithms exist. In particular, the GMRES and its variants are most widely used. Similar efficient superlinear convergence estimates exist for the GMRES in case of the decomposition (6.2). The sharpest estimate has been proved in [59] on the Hilbert space level for an invertible operator A ∈ B(H ), using products of singular values and the residual error vectors r k := Au k − b: Here the singular values of a general bounded operator are defined as the distances from the best approximations with rank less than j. Hence s j (A −1 ) ≤ A −1 for all j and the right hand side (r.h.s.) above is bounded by k j=1 s j (E) A −1 k . The inequality between the geometric and arithmetic means then implies the following estimate, which is analogous to the symmetric case (6.3): 1, 2, ...), (6.5) whose r.h.s. is a sequence decresing towards zero. We note that the above Hilbert space setting is particularly useful for the study of convergence under operator preconditioning, when the preconditioner arises from the discretization of a proper auxiliary operator. Such results have been derived by the authors in various settings, based on coercive and inf-sup-stable problems, with applications to various test problems such as convection-diffusion equations, transport problems, Hemholtz equations and diagonally preconditioned optimization problems, see, e.g. [64][65][66]. This approach will be used in the present chapter as well.

Operators of the control problem in weak form
Let us consider the control problem (2.3). We introduce the inner products with β > 0 defined in (2.3). Define the bounded linear operators Q 1 : and also, similarly Then system (2.3) can be rewritten as follows: that is, where we stress that these quations correspond to the weak form and are obtained by Riesz representation. This can be written in an operator matrix form

Well-posedness and PRESB preconditioning in a Hilbert space setting
The uniqueness of the solution of system (6.7) can be seen as follows: if b = 0, then setting the third and first equations into the second one, respectively, we obtain u + Q * 2 Q 1 Q 2 u = 0, whence, multiplying by u, we have Since Q 1 is a positive operator, we obtain u 2 ≤ 0, that is, u = 0, which readily implies y = 0 and λ = 0. Now, since the 3 by 3 operator matrix in (6.8) is a compact perturbation of the identity, uniqueness implies well-posedness (i.e. if 0 is not an eigenvalue then it is a regular value, as stated by Fredholm theory, see, e.g. [62]). Hence for any b ∈ H 1 0 ( ) there exists a unique solution (y, u, λ) of system (6.7), moreover, this solution depends continuously on b.
System (6.7) can be reduced to a system in a two-by-two block form by eliminating u using the second equation u = −Q * 2 λ, in analogy with (2.6): Now let us introduce the product Hilbert space As seen above, for any b ∈ H, after eliminating u, system (6.9) has a unique solution (y, λ), which depends continuously on b. This means well-posedness, in other words, L is invertible, hence the inf-sup condition holds: According to (3.4), we define the PRESB preconditioning operator as Further, letting (that is, the remainder term), we have the decomposition Now one can see similarly to the case of L that P is also invertible: first, uniqueness of solutions for systems with P follows just as in the algebraic case described in Sect. 3, using that Q 1 and Q 2 Q * 2 are positive operators, and then the well-posedness follows again from Fredholm theory. Consequently, we can write (6.17) in the preconditioned form (6.18)

The finite element discretization
Recall the system matrix (2.7) and the preconditioner (3.4), where, for simplicity, we will omit the upper index "(1)" in what follows: These matrices are the discrete counterparts of the operators L and P in (6.11) and (6.15). Recall the definitions M 1 : Further, let us define the matrices Here the "energy matrix" S h corresponds to the energy inner product (6.10), and Q h is the discrete counterpart of the operator Q. Then the decomposition can be written in the preconditioned form where I h denotes the identity matrix (of size corresponding to the DOFs of the FE system).
Using the definition of the stiffness matrix, a useful relation holds between S h and the underlying inner product , . H in the product FEM subspace Namely, if x, w ∈ V h are given functions and c, d are their coefficient vectors, then where · denotes the ordinary inner product on R n . In the sequel we will be interested in estimates that are independent of the used family of subspaces. Accordingly, we will always assume the following standard approximation property: for a family of subspaces (V h ) ⊂ H, for any u ∈ H, dist(u, V n ) := min{ u − v n H : v n ∈ V n } → 0 (as n → ∞). (6.24)

Superlinear convergence for the control problem
Our goal is to study the preconditioned GMRES first on the operator level and then for the FE system.

Convergence estimates in the Sobolev space
Our goal is to prove superlinear convergence for the preconditioned form of (6.13): First, the desired estimates will involve compact operators, hence we recall the following notions in an arbitrary real Hilbert space H : 1, 2, . . .) the ordered eigenvalues of a compact self-adjoint linear operator F in H if each of them is repeated as many times as its multiplicity and |λ 1 (F)| ≥ |λ 2 (F)| ≥ · · · (ii) The singular values of a compact operator C in H are where λ j (C * C) are the ordered eigenvalues of C * C.

Proposition 6.1
The operators Q 1 and Q 2 in (6.6) are compact.

Proof
The L 2 inner product in a Sobolev space generates a compact operator, see, e.g. [63]. The operators Q 1 and Q 2 correspond to L 2 inner products on 1 and 2 , hence they arise as the composition of a compact operator with a restriction operator from to 1 or 2 in L 2 ( ). Altogether, Q 1 and Q 2 are compositions of a compact operator with a bounded operator, hence they are also compact themselves.
Proof Using the invertibility of P and L, the compactness of P −1 Q and the decomposition (6.18), we may apply estimate (6.5) with operators A := P −1 L and E := P −1 Q. The fact that s j (P −1 Q) → 0 implies that ε k → 0.
Later on, we will be interested in estimates in families of subspaces. In this context the following statements involving compact operators will be useful, related to inf-sup conditions and singular values: Proposition 6.3 [64,66] Let L ∈ B(H) be an invertible operator in a Hilbert space H, that is, (6.28) and let the decomposition L = I + E hold for some compact operator E. Let (V n ) n∈N + be a sequence of closed subspaces of H such that the approximation property (6.24) holds. Then the sequence of real numbers satisfies lim inf m n ≥ m.

Convergence estimates and mesh independence for the discretized problems
Our goal is to prove mesh independent superlinear convergence when applying the GMRES algorithm for the preconditioned system (6.29) Here the system matrix is A = P −1 h A h , and we use the inner product c, d S h := S h c · d corresponding to the underlying Sobolev inner product via (6.23). Owing to (6.22), the preconditioned matrix is of the type (6.2), hence estimate (6.5) holds in the following form: 1, 2, . . . , n). (6.30) In order to obtain a mesh independent rate of convergence from this, we have to give a bound on (6.30) that is uniform, i.e. independent of the subspaces Y h and h . This will be achieved via some propositions on uniform bounds. An important role is played by the matrix In accordance with Proposition 6.3, we consider fine enough meshes such that the following inf-sup property can be imposed: there existsm > 0 independent of h such that Proof Both matrices are self-adjoint w.r.t. the K-inner product since M 1 and M 0 are symmetric. Hence, first, where C is the Poincaré-Friedrichs embedding constant and y stands for the function in the subspace Y h whose coefficient vector is y. Further, Here, for a fixed vector λ, denote v := (M 2 + K 2 ) −1 M T 0 λ. Then for the functions v and λ in the subspaces U h and h , whose coefficient vectors are v and λ, respectively. Hence, from the Cauchy-Schwarz inequality, Now, the definition of v, (6.33) and (6.34) yield  Hence for any vector y = 0, denoting z := N −1 y, we have |Nz| 2 K = |y| 2 K := Ky · y ≤ (K + M 1 )y · y = K −1 (K + M 1 )y, y K = N −1 y, y K = z, Nz K ≤ |z| K |Nz| K , hence |Nz| K ≤ |z| K , i.e. N K ≤ 1, which is independent of h.
Secondly, since M 0 is also positive semidefinite, the same proof applies to (I + K −1 M 0 ) −1 as well.
Finally, the independence property for K −1 M 0 has already been proved in Proposition 6.5. Altogether, our proposition is thus also proved. Now we can derive our final result: Theorem 6.2 Let our family of FEM subspaces satisfy properties (6.24) and (6.32). Then the GMRES iteration for the n×n preconditioned system (6.29), using PRESB preconditioning (6.19), provides the mesh independent superlinear convergence estimate and (ε k ) k∈N + is a sequence independent of h.
Proof Owing to Corollary 6.2 and Proposition 6.6, there exist constants C 0 , C 1 > 0 such that independently of h. We can easily see that the matrices A −1 h S h are also uniformly bounded in S h -norm. Namely, inequality (6.32) yields inf c∈R n c =0 From the above, we obtain This has been proved in [66] for another compact operator and energy matrix, and the argument is analogous to our case: in fact, it directly follows from Proposition 6.4 (b) if P is the projection to our product FEM subspace V h . Then, combining this estimate with (6.38) and using Proposition 6.4 (a), we obtain Altogether, using (6.39) and (6.40), the desired statements (6.36)-(6.37) readily follow from (6.30).

Extended problems
The distributed control problem (2.1) and (2.2) has proper variants, see also [49]. The finite element solution of these problems leads to similar systems as in (2.5), such that the mass matrix block M 0 is replaced by some other blocks, corresponding again to proper discretized compact operators. Based on this, one can repeat the arguments of the previous subsections and similarly obtain mesh independent superlinear convergence of the preconditioned GMRES iteration under the PRESB preconditioner. These analogous derivations are not detailed here, we just mention the problems themselves based on [49] and indicate the full analogy of their structures.

Boundary control of PDEs
The boundary control problem involves the minimization of the same functional (2.1) subject to the PDE constraint where the control function u is applied on the boundary, but f is a fixed forcing term. The FE solution of this problem leads to a similar system as in (2.5), where the mass matrix M 0 is replaced by a matrix N connecting interior and boundary basis functions. The mass and stiffness matrices for u now act on the boundary: they are denoted by M u,b and K u,b . Altogether, the matrix analogue of (2.5) takes the form

Control under box constraints
In real problems one often has to take box constraints into account, in which the functions y and/or u are assumed to satisfy additional pointwise constraints. For the state variable y, this prescribes y a ≤ y ≤ y b for some given constants y a and y b , and similarly, for u we prescribe u a ≤ u ≤ u b . An efficient way to handle such problems includes penalty terms in the objective function and semi-smooth Newton iterations for their minimization, see [30,49]. See also [67,68]. To this paper further related references, see [66,[68][69][70][71][72][73][74][75][76]. The arising linear systems (after proper rearrangement) have a form similar to (2.5). For the state constrained case the matrix is where ε > 0 is a small penalty parameter and G A is a diagonal matrix with values 0 or 1 indicating whether y satisfies the box constraint in that coordinate. The reduced matrix and the PRESB preconditioner are derived again analogously to (6.19). The new factors G A at the mass matrix M y do not change the fact that the term G A M y G A corresponds to a discretized compact operator, hence the structure of this problem is again analogous to the previous ones.

Concluding remarks
It has been shown that the PRESB preconditioning method applied for two-by-two block matrix systems with square blocks can outperform other methods, such as the block diagonally preconditioned MINRES method. The PRESB method can be accelerated by the GMRES method, which results in a superlinear rate of convergence.
Since in some problems the eigenvalue bounds are known and often tight, one can as an alternative method use a Chebyshev acceleration which doesn't give a superlinear convergence but saves computational vector inner products and therefore saves wasted elapsed computer times for global communications between processors. supported by the Hungarian Ministry of Human Capacities, and further, it was supported by the Hungarian Scientific Research Fund OTKA SNN125119.
Funding Open access funding provided by Eötvös Loránd University.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.