Preconditioning techniques for the coupled Stokes–Darcy problem: spectral and field-of-values analysis

We study the performance of some preconditioning techniques for a class of block three-by-three linear systems of equations arising from finite element discretizations of the coupled Stokes–Darcy flow problem. In particular, we investigate preconditioning techniques including block preconditioners, constraint preconditioners, and augmented Lagrangian-based ones. Spectral and field-of-value analyses are established for the exact versions of these preconditioners. The result of numerical experiments are reported to illustrate the performance of inexact variants of the various preconditioners used with flexible GMRES in the solution of a 3D test problem with large jumps in the permeability.


Introduction
The coupled Stokes-Darcy model describes the interaction between free flow and porous media flow. It is a fundamental problem in several fields [14]. In one subregion of the flow domain a free-flowing fluid is governed by the (Navier-)Stokes equations; in another subregion, the fluid follows Darcy's Law. The equations are coupled by conditions across the interface between the two subregions. In this paper we will only consider the case of stationary problems and Stokes flow.
Let be a computational domain partitioned into two non-overlapping subdomains 1 and 2 , separated by an interface 12 . We assume that the flow in 1 is governed by the stationary Stokes equations: −∇ · (2ν D(u 1 ) − p 1 I) = f 1 in 1 , ∇ · u 1 = 0 in 1 , Here ν > 0 represents the kinematic viscosity, u 1 and p 1 denote the velocity and pressure in 1 , f 1 is an external force acting on the fluid, I is the identity matrix, and is the rate of strain tensor. We also assume that the boundary 2 = ∂ ∩ ∂ 2 of the porous medium is partitioned into disjoint Neumann and Dirichlet parts 2N and 2D , with 2D having positive measure. The flow in 2 is governed by Darcy's Law: −∇ · K ∇ p 2 = f 2 in 2 , Here p 2 represents the Darcy pressure in 2 , and the symmetric positive definite (SPD) matrix K represents the hydraulic conductivity in the porous medium. The Darcy velocity can be obtained from the pressure using The coupling between the two flows comes from the following interface conditions on the internal boundary 12 . Let n 12 and t 12 denote the unit normal vector directed from 1 to 2 and the unit tangent vector to the interface. Then we impose u 1 · n 12 = −K ∇ p 2 · n 12 , (−2ν D(u 1 ) n 12 + p 1 n 12 ) · n 12 = p 2 , u 1 · t 12 = −2νG(D(u 1 ) n 12 ) · t 12 .
The first two conditions enforce mass conservation and the balance of normal forces across the interface; the third condition represents the Beavers-Joseph-Saffman (BJS) law, in which G is an experimentally determined constant. Let be the Stokes velocity and pressure spaces and let Q 2 = {q 2 ∈ H 1 ( 2 ) | q 2 = 0 in 2D } be the Darcy pressure space. The weak formulation of the coupled Stokes-Darcy problem is: find u 1 ∈ X, p 1 ∈ Q 1 and p 2 ∈ Q 2 such that a(u 1 , p 2 ; v 1 , q 2 ) + b(v 1 , p 1 ) = f(v 1 , q 2 ) ∀v 1 ∈ X, ∀q 2 ∈ Q 2 , b(u 1 , q 1 ) = 0 ∀q 1 ∈ Q 1 .
Here a(u 1 , p 2 ; v 1 , q 2 ) = a 1 (u 1 , v 1 ) + a 2 ( p 2 , q 2 ) + a 12 (u 1 , , The well-posedness of the weak formulation is a consequence of Brezzi-Fortin theory (see, e.g., [11]). The weak form is discretized using conforming finite elements spaces X h ⊂ X, Q h 1 ⊂ Q 1 satisfying the inf-sup condition for the Stokes velocity and pressure, such as the MINI and Taylor-Hood elements. For the Darcy pressure a space of piecewise continuous polynomials Q h 2 ⊂ Q 2 is used (linear in 2D, quadratic in 3D).
The discrete form of the weak formulation can be cast as a block linear system of the form where A 2 , A 1 , A 12 are the matrices of the discrete bilinear forms and B is the discrete divergence. Under our assumptions A 2 and A 1 are SPD and B has full row rank; we refer the reader to [12,13] for further details.
We now introduce a slight change of notation and rewrite the previous linear system of equations in the following form: where A 11 and A 22 are both SPD, A 21 := −A T 12 and B has full row rank. We observe in passing that this system is an example of a double saddle point problem, and that similarly structured systems arise in a number of applications, see [5]. In particular we note that using the following similarity transformation, The iterative solution of the discrete coupled Stokes-Darcy equations has attracted considerable attention in recent years. Here we limit ourselves to discussing solution algorithms based on preconditioned Krylov subspace methods. In [13], the following two constraint-type preconditioners were proposed for accelerating the convergence of Krylov subspace methods: Also Cai et al. [12] proposed the following block triangular preconditioner, where M p is the mass matrix associated with the Stokes pressure space, and ρ > 0 is a user-defined parameter. For 2D problems, the preconditioner P con T outperforms the other preconditioners in terms of both iterations and CPU-time when used with GMRES. Exact variants of the preconditioners result in h-independent rates of convergence, as predicted by the theory. For 3D problems, only the inexact variants of these preconditioners are feasible. Of the inexact variants, P T 1 ,ρ with suitable ρ requires the least CPU time, according to [13]. The constraint preconditioners and P T 1 ,ρ are norm equivalent to A in (1) under certain conditions; see [12,13]. On the other hand, the Field-of-Values (FOV) equivalence of constraint preconditioners with A was proved in [13]. It is well-known that if a preconditioner is norm equivalent to the coefficient matrix of a linear system of equations, the spectra of the preconditioned system remain uniformly bounded and bounded away from zero as the mesh size h → 0, see [23] for more details. Here, we directly determine some bounds for the eigenvalues of the preconditioned matrices associated with P con D , P con T and P T 1 ,ρ . In particular, we show that the eigenvalues of the preconditioned matrix corresponding to P con T are nicely clustered under certain conditions.
In the present work, we also consider the following block triangular preconditioner: applied to the augmented linear system of equationsĀu =b, wherē , with Q being an arbitrary SPD matrix and r > 0 a user-defined parameter. Evidently, the linear system of equationsĀx =b is equivalent to Au = b. This approach is motivated by the success of the use of grad-div stabilization and augmented Lagrangian techniques for solving saddle point problems. The remainder of this paper is organized as follows. In Sect. 2 we derive lower and upper bounds for the eigenvalues of the preconditioned matrices corresponding to all of the above-mentioned preconditioners. In Sect. 3, we establish FOV-type bounds for the preconditioned system associated with the preconditioner of type (5). Some numerical experiments are reported in Sect. 4 to compare the performance of preconditioners, in particular in the presence of inexact solves. Brief conclusive remarks are given in Sect. 5.

Notations.
We use "i" for the imaginary unit. The notation σ (A) is used for the spectrum of a square matrix A. When all eigenvalues of A are real and positive, we use λ min (A) and λ max (A) to respectively represent the minimum and maximum eigenvalues of A. The notation ρ(A) stands for the spectral radius of A. When A is symmetric positive (semi)definite, we write A 0 (A 0). Furthermore, for two given matrices A and B, by A B (A B) we mean A − B 0 (A − B 0). For H 0, the corresponding vector norm is given by whose induced matrix norm is defined by For given vectors x, y and z of dimensions n, m and p, (x; y; z) will denote a column vector of dimension n + m + p.

Spectral analysis
This section is devoted to deriving lower and upper bounds for the eigenvalues of preconditioned matrices P −1 con T A, P −1 con D A, P −1 T 1 ,ρ A and P −1 rĀ . To this end, first, we recall the following basic lemma which is an immediate consequence of Weyl's Theorem, see [20,Theorem 4.3.1].

Lemma 2.1 Let A and B be two Hermitian matrices. Then,
The following two results provide information on the eigenvalues of the constraintpreconditioned matrices. Theorem 2.2 Suppose that P con D is defined by (3). The eigenvalues of P −1 con D A are either equal to unity, or complex numbers of the form Proof Let λ ∈ σ (P −1 con D A) with the corresponding eigenvector (x; y; z). Therefore, we have Notice that λ = 1 is a possible eigenvalue of P −1 con D A with corresponding eigenvector (0; y; z) where y ∈ Ker(A 12 ) and z is an arbitrary vector such that y and z are not both zero.
It is also immediate to observe that λ = 1 implies y = 0. Otherwise, in view of (7), we have x = 0 when y = 0. In this case, Eq. (8) results in B T z = 0, which is equivalent to z = 0 since B T has full column rank. Hence, we get (x; y; z) = (0; 0; 0), which is impossible since (x; y; z) is an eigenvector.
In the rest of proof, we assume that λ = 1, which implies By = 0 by (9). Now, we can compute vector x from (7) as follows: Multiplying both sides of Eq. (8) by y * and recalling that A 21 = −A T 12 , we get which is equivalent to from which the assertion follows immediately.

Theorem 2.3 Let P con T be defined by (3). Then
Proof Let λ be an arbitrary eigenvalue of P −1 con T A with the corresponding eigenvector (x; y; z). Therefore, we have, For y ∈ Ker(A 12 ), we observe that 1 ∈ σ (P −1 con T A) with the corresponding eigenvector (x; y; z) where x and z are arbitrary vectors with at least one of them nonzero when y = 0. In the rest of proof, we assume that λ = 1. From (10), we obtain Notice that from the fact that B T has full column rank, similar to the proof of Theorem 2.2, one can conclude that y = 0. In addition, Eq. (12) shows that By = 0 when λ = 1. Now, we first substitute x from the above expression into (11) and then multiply both sides by (λ − 1)y * , which yields Using A 21 = −A T 12 , we obtain the following quadratic equation: The roots of (13) are given by λ 1 = 1 and λ 2 = 1 +γ . This shows that the eigenvalue λ ∈ σ (P −1 con T A), not being equal to one, can be written in the following form: which completes the proof.

Remark 2.4 If
, then from the proof Theorem 2.3 we can conclude that λ ∈ [1, 2] for λ ∈ σ (P −1 con T A). Similarly, from the proof of Theorem 2.2, for λ ∈ σ (P −1 con D A), we can deduce that |Im(λ)| ≤ 1 when A 22 A T 12 A −1 11 A 12 . We have been able to verify numerically that for linear systems of the form (1) arising from the finite element discretization of coupled Stokes-Darcy flow, the condition A 22 A T 12 A −1 11 A 12 in the above remark is indeed satisfied. In fact, we observed that the condition A 22 holds true for problems of small or moderate size, and numerical tests suggest that it may hold for larger problems as well.
We now turn to the spectral analysis of the block triangular preconditioner P T 1 ,ρ . Based on numerical experiments, Cai et al. [12] pointed out that the performance of P T 1 ,ρ is not very sensitive to the scaling factor ρ, particularly when it belongs to interval [0.6, 1.05]; see [12, Table 2]. Moreover, it is numerically observed that the spectrum P −1 T 1 ,ρ A lies in a semi-annulus which does not include zero and is entirely contained in the right half-plane; see [12, Figure 2]. In what follows, we show that the experimentally observed eigenvalue distribution of P −1 T 1 ,ρ A can be theoretically proven for certain values of ρ. To do so, first, we recall a theorem established by Kakeya [21] in 1912. Notice that if a monotonicity assumption holds for the coefficients of the polynomial p in the above theorem, i.e., 0 ≤ a 0 ≤ a 1 ≤ · · · ≤ a n , then all zeros of p are strictly less than unity in modulus. The latter results is the well-known Eneström-Kakeya Theorem [1].
Proof Let λ be an arbitrary eigenvalue of P −1 T 1 ,ρ A with the corresponding eigenvector (x; y; z). As a result, we have Note that 1 ∈ σ (P −1 T 1 ,ρ A) with the corresponding eigenvector (0; y; 0) for 0 = y ∈ Ker(A 12 ). From now on, we assume that λ = 1. From (15) and (17), we obtain Notice that y is nonzero, otherwise x and z are both zero which is in contradiction with the fact that (x; y; z) is an eigenvector. In the sequel, without loss of generality, we assume that y * y = 1. Noting that A 21 = −A T 12 , we substitute x and z in (16) which yields where, p = y * A T 12 A −1 11 A 12 y, q = y * A 22 y and r = ρ −1 y * B T M −1 p By.
Multiplying both sides of the preceding relation by λ(1 − λ), we derive For simplicity, we set t = λ − 1 and rewrite the previous relation as follows: It is easy to check that that if λ = 1, then y ∈ Ker(B) ∩ Ker(A 12 ) implies that y is a zero vector, which is impossible. As a result, p and r cannot be both zero. When p = 0, we readily obtain Notice that if r = 0 then either t = ±i p q or t + 1 = 0. Since λ = 0, in this case, we only conclude that λ = 1 ± i p q and ϕ 1 < p q < 1 with Next, we assume r , p = 0 (i.e., y / ∈ Ker(B) ∪ Ker(A 12 )) and rewrite (18) as follows: By the assumption q > r + p, hence Therefore, from Theorem 2.5, we conclude that ξ ≤ |t| = |λ − 1| ≤ 1 by setting keeping in mind that p q ≤ p q−r . Now, from Eq. (18), we observe that Theorem 2.7 Suppose that P r andĀ are respectively defined by (5) and (6). The eigenvalues of P −1 rĀ are all real and positive. More precisely, we have Proof For ease of notation, we setĀ 22 be an arbitrary eigenvalue with the corresponding eigenvector (x; y; z). Therefore, we have Notice that for y / ∈ Ker(B), we have that λ = 1 is an eigenvalue of P −1 r A with the corresponding eigenvector (0; y; −r Q −1 By). Also, λ = 1 is obviously an eigenvalue of P −1 r A associated with eigenvector (0; y; 0) when 0 = y ∈ Ker(B). In the rest of proof, we assume that λ = 1. From Eqs. (19) and (21), we respectively derive The preceding two relations for x and z make it clear that y cannot be the zero vector. Without loss of generality, we may assume that y 2 = 1. If 0 = y ∈ Ker(B), we deduce that z = 0. It follows that recalling that A 21 = −A T 12 . Now, we discuss the case that y / ∈ Ker(B). Substituting vectors x and z into (20) and performing straightforward computations, we obtain Multiplying both sides of the above relation by −λ, we obtain the following quadratic equation: where we have that all of the eigenvalues of P −1 rĀ are real. Moreover, if λ 1 and λ 2 are the roots of (22) then Hence, recalling thatĀ 22 as r → ∞, i.e., all eigenvalues satisfying (22) tend to 1 for r → ∞. From (22), we have It is not difficult to verify that Consequently, in view of the facts that λ min (B T Q −1 B) = 0, η ≤ 1 and 0 < ζ ≤ η for y / ∈ Ker(B), we conclude that .
Evidently, since η ≤ 1, we have which completes the proof.

Remark 2.8
Notice that when A 22 A T 12 A −1 11 A 12 , then from the proof of Theorem 2.7, we obtain that σ (P −1 rĀ ) lies in the interval [1,2] in the limit as r → ∞. Moreover, "most" of the eigenvalues of the preconditioned matrix are either equal to 1, or tend to 1 as r → ∞. The eigenvalue distribution in Theorem 2.7 and Remark 2.8 is illustrated in Fig. 1. A full description of the test problem and of its finite element discretization can be found in Sect. 4.

Field-of-values analysis
In the previous section, we established eigenvalue bounds for the preconditioned matrices. A clustered spectrum (away from 0) often results in rapid convergence, particularly when the preconditioned matrix is close to normal; see [6] for more details. However, the situation is more complicated when the problem is far from normal. Indeed, the eigenvalues may not describe the convergence of nonsymmetric matrix iterations like GMRES; see [19]. The notions of norm equivalence and of field-of-values equivalence often provide the theoretical framework needed to establish optimality of a class of preconditioners for Krylov methods like GMRES [2,4,9,15,16,22,23]; see also [7] for a recent overview.
Here we derive FOV-type bounds for the preconditioned matrix associated with preconditioner P r . For the constraint preconditioners P con D and P con T , this equivalence has been established by Chidyagwai et al. [13].

Basic concepts
In this subsection we briefly overview the required background for establishing FOVtype bounds, see [13,23] for more details.
We begin by reviewing the notion of spectral equivalence for families of SPD matrices [3]. Recall that two families of SPD matrices {A n } and {B n } (parametrized by their dimension n) are said to be spectrally equivalent if there exist n-independent constants α and β with Equivalently, {A n } and {B n } are spectrally equivalent if the spectral condition number κ(B −1 n A n ) is uniformly bounded with respect to n. Yet another equivalent condition is that the generalized Rayleigh quotients associated with A n and B n are uniformly bounded: Note that this is an equivalence relation between families of matrices. Next, we recall the concepts of H -norm-equivalence and H -field-of-valueequivalence, where H corresponds to a given SPD matrix. For simplicity, in the following we drop the subscript n but it should always be kept in mind that matrices representing discretizations always depend on the dimension n (which in turn depends on the mesh size h). Similarly, with a slight abuse of language we will talk of equivalence of matrices rather than of families of matrices.

Definition 3.2
Let H be an n ×n symmetric positive definite matrix and let A ∈ R n×n . The H -field of values of A is the set This definition implies that if M≈ H N , the H -field of values of M N −1 lies in the right half-plane and is both bounded and bounded away from 0 uniformly in n.

Remark 3.4
If M and N are symmetric positive definite and H = I n , then M≈ H N reduces to spectral equivalence.

Proposition 3.5 Let M and N be two symmetric nonsingular matrices such that M≈ H N where H is a given symmetric positive definite matrix. Then M
Proof The result follows from some algebraic computations. Note that M and N are both symmetric and relations (24) hold for some constant α 0 and β 0 .
Let v be an arbitrary nonzero vector. Setting Now setting y = M N −1 z in the above relation, we find that To complete the proof, we need to show that there existsβ 0 such that For an arbitrary nonzero vector v, we have that Now, we set y = H x which together with the above relation implies that which completes the proof. We also recall the following useful definitions and properties.
Definition 3.7 Let M ∈ R m×n and let H 1 ∈ R n×n , H 2 ∈ R m×m be two symmetric positive definite matrices, then Note that when H 1 = H 2 = H and m = n, the above definition reduces to the following standard matrix norm, Also, it can be seen that It can be shown that when H 1 = H 2 = H , from the above relation, we have We further observe that where H 3 is a given arbitrary SPD matrix of the appropriate size. Henceforth we assume that the matrix A ∈ R n×n satisfies the following stability conditions [12,13]: where c 1 and c 2 are positive constants independent of n, and the matrix H is SPD. For the coupled Stokes-Darcy problem H is a block diagonal matrix with diagonal blocks H 1 ∈ R n 1 ×n 1 and H 2 ∈ R n 2 ×n 2 (with n 1 + n 2 = n) given by where M p denotes the mass matrix for the Stokes pressure space; see [13] for more details. Similar to [13], we also need the following results from [23] which can be obtained making use of the stability conditions (25).

and in particular
Lemma 3.9 Let (25) hold and assume that P ∼ H −1 H , then (25) hold. If there exists c 3 independent of n 1 such that

FOV-type bounds
The following proposition is established in [17] for symmetric matrices. In [18, Poposition 2.1], it is pointed out that the result remains true for nonsymmetric matrices as well.

Proposition 3.13
Suppose that A is a general n × n matrix, C is full row rank p × n matrix ( p ≤ n), and W is a p × p matrix. Define, If K := K(0) is nonsingular, then K(W ) is a nonsingular matrix for any nonzero W and Let us consider the following partitioning forĀ and write the matrix in the form of (28),Ā As a result, from Proposition 3.13, we havē The above discussion allows one to find sufficient conditions which ensure that stability conditions similar to (25) hold forĀ. In particular, it turns out that stability conditions forĀ can be deduced from (25) by suitable choices of Q including the case that Q = M p . To this end, we recall the following observation, which is a consequence of [23, Lemma 2.1] and Lemma 3.12.

Remark 3.14 Let the matrices H and A be defined as before. Then
In view of the above remark, we need to establish that Ā H ,H −1 and Ā −1 are bounded from above in order to show thatĀ satisfies stability conditions similar to (25). Notice that (25) together with (29) imply that On the other hand, we have that It is known that C T . Therefore, by Lemma 3.10, the following inequality holds when the first condition in (25) is satisfied: From the above discussions, it can be observed that if we set Q = M p then Eqs. (32) and (33) reduce to the following inequalities, respectively: In order to deal with the augmented system, the following lemma provides a useful expression for the Schur complement. The lemma is an immediate consequence of Proposition 3.13, see [10, Lemma 4.1] for more details.
We denote the negative Schur complement associated withĀ byS. By the above lemma, it is immediate to see that Hence, we have Assuming that the assumption of Lemma 3.11 holds, we readily obtain For simplicity we setĀ = A + rC T Q −1 C, S A = A 22 − A 21 A −1 11 A 12 , and SĀ = S A + r B T Q −1 B. It is well known (see, e.g., [8, page 18]) that Hence, considering the singularity of S

Now, Eq. (36) implies that
It is not difficult to verify that Therefore, the above relation together with the fact that S −1  Hence, there existsc 4 independent of n such that S −1 Consider again the matrixĀ in the following form: We note that for r = 0,Ā = A andĀ reduces to The constraint preconditioners can be written in the following form P con = P con C T C 0 , and it is shown that P con is H -norm equivalent (and consequently H -f.o.v equivalent) to the operator A where with H 1 and H 2 being symmetric positive definite given by (26) under suitable conditions, see [13]. Now consider the following preconditioner: where Q is symmetric positive definite and r > 0 is given. Here, we comment that Q can be taken to be any SPD matrix such that (38) holds for a constantη > 0.
For simplicity, we rewrite P r as follows: with the obvious definition ofP. Now we establish a proposition which will be useful in proving norm-equivalence of the preconditioner P r toĀ. To this end, we first need to recall the following well known fact.
then there there existsc 4 independent of n such that r Q −1 Proof First note that Theorem 3.17 implies Since H 2 is a symmetric positive definite matrix, we have H which is equivalent to Proof By the assumptions, there exist α 0 and β 0 such that It is obvious thatP

Consequently, we have
Now from Lemma 3.10 and Proposition 3.18, we find that forβ 0 = β 0 + c 2 1c 4 , we have Note that .

Recalling that
≤ c 1 , from the preceding inequality we The proof of the next theorem follows from a similar argument used in [13,Theorem 3.9], where P con ∼ H −1 1 H 1 was an assumption. In view of the previous proposition, the where P is the block upper triangular part of A. On the other hand, the matrix P con in the constraint preconditioners P con D and P con T is, respectively, the block diagonal and block lower triangular part of A. However, considering Eq. (26), one can see that if P con and P are the block lower or the block upper triangular part of A, then P con ∼ H −1 ≤ c 1 . More precisely, it turns out that
We comment that if P con is the block diagonal part of A, then Therefore, in the analysis of [13,Section 3], there is no need to require that P con ∼ H −1 1 H 1 for establishing FOV bounds (independent of the mesh-width) when the preconditioner is applied "exactly", i.e., when direct methods are used for the block solves. (40) and (41), respectively. In addition to the hypotheses of Proposition 3.18, assume that r > 0 is such that

Theorem 3.20 Let H and P r be defined as in
Notice that from (44), we get By Lemmas 3.10 and 3.12 , we have Furthermore, we have Therefore, it is immediate to see that which completes the proof.

Remark 3.21
By Theorem 3.17, assumption (44) is equivalent to setting the following lower bound for r : Noting that we deduce that the condition (44) holds for r ≥ λ max (Q)/λ min (H 2 ).
We are now in a position to establish the main result of this section.

Theorem 3.22
LetĀ, P r be defined as before. In addition to the hypotheses of Proposition 3.18, suppose that the condition (44) is satisfied and there exists a constant ν > 0 such that for any nonzero y ∈ R n 2 the following inequality holds: , then there exists ρ 0 > 0 such thatĀ ≈ H −1 P r for all r ≥ ρ 0 provided . Let x = (x 1 ; x 2 ) be given. To complete the proof, in the sequel, we show that there exists a positive constant τ such that ).

Next, observe that
we know that there exists a positive constat α 0 such that , while the rest of the proof remains unchanged after removing the restriction r ≥ 1.
For the sake of generality, we add the following remark showing that the above result can be stated under weaker conditions.

Remark 3.24
In Proposition 3.18, the assumption (42) was used to deduce that r Q −1 2 ,H 2 is bounded above by a constant. On the other hand, the condition (44) ensures that Following the preceding discussion, the assumption (44) can be relaxed by setting the condition that is bounded above by a constant. Now, in view of the following equality one can relax the assumptions (42) and (44). To this end, we need to choose r and Q such that 1 We have checked numerically that for linear systems of the form (1)

Proposition 3.25 LetĀ andP be defined bȳ
where A 21 = −A T 12 and H 1 is given by (26). If the stability conditions (39) hold, then there exists β 0 > 0 such that Furthermore, assume that A 22 A T 12 A −1 11 A 12 and λ M := λ max (A −1 22 A T 12 A −1 11 A 12 ) ≤ 3/4. If the following relation holds: . Proof It is straightforward to check that , having in mind that A T 12 = −A 21 . Moreover, we have that where we made use of To prove the assertion, we need to show that there exists α 0 such that To this end, we first show that the assumption (51) guarantees that the matrix 22 (where we setŜ := A T 12 A −1 11 A 12 for notational simplicity) is positive semi-definite, in the sense that the quadratic form F z, z is nonnegative for any real vector z. We comment that F is symmetric positive definite for r = 0 since in this caseĀ 22 By the Sherman-Morrison-Woodbury matrix identity we havē hence we can write where E is a symmetric positive semi-definite matrix given by Next, we observe that the nonzero eigenvalues of E are the same as those of the matrix and that the spectrum ofẼ is given by Notice that the function g r (x) = r x 1+r x is monotonically increasing for x, r > 0. Hence, we have .
Let z be an arbitrary real vector, then, using the above relation, we have  x (with block partitioning w = (w 1 ; w 2 )). Using the Cauchy-Schwarz inequality and some straightforward computations, we obtain

By assumption, we have
. Notice that setting z = QS −1 0 w, using the assumption (53), we have . The function f (x) is monotonically increasing for 0 < r < ν −1 0 . Using the fact that t ≥γ −2 , it follows that . Finally, we observe that in [23,Theorem 3.8], for the case r = 0, it is assumed that Q −1 ≈ H 2S −1 0 . Here we point out that Proposition 3.5

Numerical experiments
In practice, all the preconditioners considered so far must be applied inexactly, especially when solving 3D problems. Whether the mesh-independent behavior is retained or not by the inexact variants is not clear a priori; as we will see, the choice of inexact solver may impact some preconditioners more than others. In this section we illustrate the performance of inexact variants of the block preconditioners using a test problem, taken from [13,Subsection 5.3], which corresponds to a 3D coupled flow problem in a cube We report the performances of several preconditioner variants in conjunction with FGMRES [24]. The initial guess is taken to be the zero vector and the iterations are stopped once Au k − b 2 ≤ 10 −7 b 2 (or Ā u k −b 2 ≤ 10 −7 b 2 for the augmented Lagrangian variants) where u k is the obtained k-th approximate solution. In addition, we have used right-hand sides corresponding to random solution vectors and averaged results over 10 test runs. At each iteration of FGMRES, we need to solve at least two SPD linear systems as subtasks. To this end we applied two different approaches, discussed in the following two subsections.
All computations were carried out on a computer with an Intel Core i7-10750H CPU @ 2.60GHz processor and 16.0GB RAM using MATLAB.R2020b.

Implementation based on IC-CG
First we present the results of experiments in which, inside FGMRES, the SPD subsystems were solved inexactly by the preconditioned conjugate gradient (PCG) method using loose tolerances. More precisely, the inner PCG solver for linear systems with coefficient matrix A 11 (A 22 and A 22 + r B T Q −1 B) was terminated when the relative residual norm was below 10 −1 (respectively, 10 −2 ) or when the maximum number of 5 (respectively, 25) iterations was reached. In the implementation of the preconditiner P T 1 ,ρ , the inverse of M p was applied inexactly using PCG with a relative residual tolerance of 10 −2 and a maximum number of 20 iterations. The preconditioner for PCG are incomplete Cholesky factorizations constructed using the MATLAB function "ichol(., opts)" where opts.type ='ict' with drop tolerances between 10 −4 and 10 −2 . The FGMRES iteration count is reported in the tables under "Iter". Under "Iter pcg i " ("Iter cg i ") we further report the total number of inner PCG (or CG) iterations performed for solving the linear systems corresponding to block (i, i) of the preconditioner, where i = 1, 2. For more details, in the "Appendix" we summarize the implementation of preconditioners P con D , P con T and P T 1 ,ρ in Algorithms 1-3. For the linear system corresponding to A 22 + r B T Q −1 B, we distinguish between two approaches: • Approach I. The matrix A 22 + r B T Q −1 B is not formed explicitly and the CG method is used without preconditioning with a relative residual tolerance of 10 −3 and a maximum allowed number of 25 iterations. • Approach II. The matrix A 22 + r B T Q −1 B is formed explicitly, and PCG with incomplete Cholesky preconditionign was used. We note that while we could successfully compute the "ichol" factor without diagonal shifts for the two smallest problem sizes, adding the shift 0.01 was found to be necessary for larger sizes. We further note that with this approach we can use larger values of r , leading to faster FGMRES convergence. In Tables 1 and 2 , we report the performance P r for Approaches I and II. From the results presented, we can see that even when implemented inexactly, the augmented Lagrangian-based preconditioner P r results in convergence rates of FGMRES that are essentially mesh-independent, as predicted by our theoretical analysis. As for the number of inner PCG iterations, we observe some differences in the results obtained with Approaches I and II. In the case of Approach I we see an increase in the total number of inner PCG iterations as the mesh is refined, reflecting the known fact that the CG method, with or without incomplete Cholesky preconditioning, is not mesh-independent in general. With Approach II this increase is not observed, however, the total timings are much higher and still scale superlinearly with the number  of degrees of freedom. This is due to the fact that explicitly forming the augmented matrix A 22 + r B T Q −1 B and computing its incomplete Cholesky factorization leads to a considerably less sparse matrix and superlinear growth in the fill-in in the incomplete factors, and thus to more expensive PCG iterations. We conclude that with P r , Approach I is to be preferred to Approach II. In Table 3, we report the results corresponding to the constraint preconditioners. Our numerical tests illustrate that in the inexact implementation of these preconditioners using IC-CG for the inner SPD linear solves, the outer iteration counts grow each time the mesh size is halved. Hence, the mesh-independence of the outer FGM-RES iteration is lost when the preconditioner is applied inexactly using incomplete Cholesky as the preconditioner for the inner PCG iterations. We add that in our implementation of these preconditioners, we approximated B A −1 22 B T by M p and solved the corresponding linear system using PCG with a maximum number of 25 iterations and relative residual tolerance tolerance 10 −2 and with incomplete Cholesky precon- ditioning where opts.droptol" was set to 10 −2 . We also tried approximating B A −1 22 B T by the diagonal of M p but the results were generally worse and we do not report them here.
Next, we consider inexact variants of the following block triangular preconditioners, andP In Tables 4 and 5 , we present results for ρ = 0.6. In [12], it was experimentally observed that the performance of P T 1 ,ρ is not sensitive to ρ when ρ ∈ [0.6, 1.05]. However, based on our experimental results, we found the optimum value ρ = 0.6. For more details, we report the results for two different cases (referred as Cases I and II) in Tables 4 and 5 by setting opts.droptol" to be 10 −4 and 10 −2 , respectively. Similar to P r , in Table 4, it is seen that for P T 1 ,0.6 , the outer iteration count for FGMRES remains essentially constant as the grid is refined, in agreement with our analysis for the exact case. Although the results in Table 5 indicate a better performance of the preconditioner for the first three problem sizes, the number of outer iterations increases drastically for the largest problem size.
From these results we see that replacing the inexact solves involving the mass matrix M p with a simple diagonal scaling involving the diagonal of M p leads to a degradation of the rate of convergence. We found that in terms of CPU time, this degradation more than offsets the savings obtained by using simple diagonal scalings in place of solves of linear systems involving M p .
Overall, when the subsystems associated with the block preconditioners are solved using (P)CG with incomplete Cholesky factorization, the fastest solution times are achieved with the inexact variant of the augmented Lagrangian preconditionr P r using what we called "Approach I". For the largest size problems, this approach is about 3.3 times faster, in terms of CPU time, than the block triangular preconditioner from [12], which is in turn far more efficient than the inexact variants of the constraint preconditioners. Furthermore, the construction cost of the incomplete Cholesky factorizations used with these preconditioners is negligible. The CPU time scaling of all these methods with respect to the mesh size is, however, superlinear in the number of unknowns due to the use of IC preconditioning in conjunction with the CG method to perform the inner iterations.

Implementation based on ARMS preconditioner
In an attempt to have better scalability of the number of inner iterations with respect to mesh refinements, as an alternative to using IC-CG, we performed some experiments with an algebraic multilevel solver for approximately solving the subsystems associated with the block preconditioners. We chose the MATLAB implementation of the ARMS preconditioner [25], which can be downloaded from https://www-users. cs.umn.edu/~saad/software/. Since the ARMS preconditioner is not SPD, for inexact solves involving sub-blocks we use it with GMRES (with relative residual tolerance 0.1 and a maximum number of iterations equal to 20) in conjunction with ARMS. With this approach, all tested preconditioners (including constraint preconditioners) appear robust, displaying meshindependent convergence of the outer FGMRES iteration, and faster convergence of the inner iterations. The obtained numerical results are shown in Tables 6 for the block triangular, constraint preconditioners, and the augmented Lagrangian-based preconditioner. To implement the preconditioner P r , the subsystem corresponding to sub-block (1, 1) is solved by GMRES in conjunction with the ARMS preconditioner. For the subsystem associated with A 22 + r B T Q −1 B, forming the ARMS preconditioner is not practically feasible for the larger problem sizes. Therefore, the matrix A 22 + r B T Q −1 B is not formed explicitly and the corresponding system is solved by the preconditioned GMRES where the ARMS preconditioner for A 22 is used. As before, we report under "Iter" the number of (outer) FGMRES iterations. Under "Iter i " we report the total number of inner iterations performed for solving the linear systems corresponding to block (i, i) of the preconditioner where i = 1, 2. To obtain results in Table 6, PCG with tolerance 10 −2 and a maximum of 20 iterations was used for solv- ing linear systems associated with M p . The corresponding total number of iterations are given under "Iter M p ". Iteration times were also found to exhibit better (though not perfect) scalability than in the experiments described in the previous subsection. The construction costs for ARMS, however, appear to be prohibitive, at least in the MATLAB implementation, completely off-setting any gains in performance. In particular, for the largest problem sizes it takes hours to compute the ARMS preconditioners.
We can see from these experiments that for all preconditioners tested, both the number of outer FGMRES iterations and (for large enough problem sizes) the total number of inner preconditioned GMRES and CG iterations remain almost constant, with outer iteration counts even improving for smaller mesh sizes. As mentioned, however, this improved scaling behavior comes at the price of much higher preconditioner construction costs. The reported solution times show that in conjunction with ARMS, the augmented Lagrangian-based preconditioner P r is both efficient and fairly robust with respect to the parameter r , and outperforms all other preconditioners for large enough problem sizes. However, it does not outperform the implementation of P r based on IC-CG for the inner solves. Given the enormous set-up costs associated with ARMS, we conclude that its use does not bring about any actual advantage in terms of times to solution, at least when working in MATLAB.
In conclusion, the results of our experiments indicate that among all preconditioner variants we tested, the inexact variant of P r with IC-CG inner solves (Approach I) is, by a large margin, the fastest solver in terms of total solution times.

Conclusions
In this paper we have provided a theoretical analysis of several types of block preconditioners for the discrete Stokes-Darcy problem. Both eigenvalue bounds and FOV-equivalence have been considered, completing the analyses given in [12] and in [13]. Our analysis extends previous results and explains the experimentally observed mesh-independence of the exact variants of the block preconditioner based on the augmented Lagrangian approach.
Numerical experiments show that inexact variants of these block preconditioners may or may not retain mesh-independence, depending on the solver used for the inexact solves. All preconditioners show near mesh-independence when a multilevel algebraic solver (ARMS) is used for the inexact solves, but this preconditionr is found to have exceedingly high construction costs. When cheaper inner solvers based on incomplete Cholesky-preconditioned CG are used, the fastest total solution times are achieved by the augmented Lagrangian-type preconditioner. It is possible, of course, that better results may be achieved with different multilevel solvers.
Future work should consider the development of similar preconditioners for the coupled Navier-Stokes-Darcy model.