CBS Constants & Their Role in Error Estimation for Stochastic Galerkin Finite Element Methods

Stochastic Galerkin finite element methods (SGFEMs) are commonly used to approximate solutions to PDEs with random inputs. However, the study of a posteriori error estimation strategies to drive adaptive enrichment of the associated tensor product spaces is still under development. In this work, we revisit an a posteriori error estimator introduced in Bespalov and Silvester (SIAM J Sci Comput 38(4):A2118–A2140, 2016) for SGFEM approximations of the parametric reformulation of the stochastic diffusion problem. A key issue is that the bound relating the true error to the estimated error involves a CBS (Cauchy–Buniakowskii–Schwarz) constant. If the approximation spaces associated with the parameter domain are orthogonal in a weighted L2\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$L^2$$\end{document} sense, then this CBS constant only depends on a pair of finite element spaces H1,H2\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$H_{1}, H_{2}$$\end{document} associated with the spatial domain and their compatibility with respect to an inner product associated with a parameter-free problem. For fixed choices of H1\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$H_{1}$$\end{document}, we investigate non-standard choices of H2\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$H_{2}$$\end{document} and the associated CBS constants, with the aim of designing efficient error estimators with effectivity indices close to one. When H1\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$H_1$$\end{document} and H2\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$H_2$$\end{document} satisfy certain conditions, we also prove new theoretical estimates for the CBS constant using linear algebra arguments.


Introduction
The motivation for this work is the design of efficient a posteriori error estimators for adaptive Galerkin finite element approximation of solutions to partial differential equations (PDEs). In particular, we are interested in PDEs with random inputs and so-called stochastic Galerkin finite element methods (SGFEMs) (see [4,5,14,22,24,31]). When the inputs are represented by a finite, or countably infinite number of random variables ξ m : Ω → R, m = 1, 2, . . . , where Ω is a sample space, it is conventional to reformulate the stochastic problem of interest as a high-dimensional deterministic one, whose solution depends on a set of parameters y m = ξ m (ω). Unlike sampling methods such as Monte Carlo, in the SGFEM approach, an approximation is sought in a tensor product space of the form H 1 ⊗ P where H 1 is an appropriate finite element space associated with the spatial domain and P is a set of polynomials associated with the parameter domain. When the number of active parameters is high, the dimension of H 1 ⊗ P for standard choices of H 1 and P can become unwieldy. To remedy this, we can either work with standard approximation spaces and deal with the resulting very large discrete systems by using smart linear algebra techniques (see [20,21,25,26,32,33]), or we can use an adaptive approach, starting in a low-dimensional space H 0 1 ⊗ P 0 , and using a posteriori error estimators to decide whether it is necessary to enrich H 0 1 or P 0 , or both. This allows us to build up a tailored sequence of approximation spaces H 1 ⊗ P , = 0, 1, . . . incrementally, so that the dimension of the final space is balanced against an error tolerance for a quantity of interest. We consider the steady-state stochastic diffusion problem. Let D ⊂ R 2,3 be a bounded spatial domain and let y = [y 1 , y 2 , . . . ] be a vector-valued parameter which takes values in Γ = ∞ m=1 Γ m (the parameter domain). We want to approximate the function u : D×Γ → R that satisfies −∇ · (a(x, y)∇u(x, y)) = f (x), x ∈ D, y ∈ Γ, u(x, y) = 0, x ∈ ∂ D, y ∈ Γ.
For simplicity, we assume that f = f (x) is independent of y, but the methodologies discussed herein can be extended to accommodate f = f (x, y). We also assume that the parameters are bounded, with y m ∈ Γ m = [−1, 1]. We denote by π(y) a product measure on (Γ, B(Γ )), where B(Γ ) is the Borel σ -algebra on Γ , so that π(y) = ∞ m=1 π m (y m ), where π m (y m ) is a measure on (Γ m , B(Γ m )). In addition, which is true when y m is the image of a mean zero random variable and π m (y m ) is the associated probability measure.

Assumption 1
The coefficient a(x, y) admits the decomposition with a 0 (x), a m (x) ∈ L ∞ (D). Moreover, there exist real positive constants a 0 min and a 0 max such that 0 < a 0 min ≤ a 0 (x) ≤ a 0 max < ∞ a.e. in D, (4) and ||a m || L ∞ (D) converges sufficiently quickly to zero as m → ∞ so that ∞ m=1 ||a m || L ∞ (D) < a 0 min .
The parameter-free function a 0 (x) typically represents the mean with each term a m (x)y m representing a perturbation away from the mean, while (5) helps ensure the well posedness of the weak formulation of (1a)-(1b). Next, we recall the classical strengthened Cauchy-Buniakowskii-Schwarz (CBS) inequality (see [1,Theorem 5.4]), a key tool in many areas of numerical analysis.
Theorem 1 Let H be a Hilbert space equipped with inner product (·, ·) and induced norm || · || and let U , V be a pair of finite-dimensional subspaces of H satisfying U ∩ V = {0}. Then, there exists a constant γ ∈ [0, 1), depending only on U and V such that The smallest constant γ ∈ [0, 1) satisfying (6) is and is known as the CBS constant. In particular, CBS constants appear in the analysis of hierarchical preconditioners (see [2,3,30]) and certain types of a posteriori error estimators for Galerkin finite element approximations to PDEs. See, for example, [1,19,27,28]. The design of error estimators for SGFEMs for parameter-dependent PDEs is still under development. However, there have been a few recent works (see [8][9][10][11][12][16][17][18]29]) for the model problem (1a)-(1b). In [12] and [11], algorithms constructing so-called sparse SGFEM approximations are driven by a priori error analysis, where the error associated with each discretisation parameter is balanced against the total number of degrees of freedom. In [16] and [17] a general framework for an explicit residual-based error estimation strategy for SGFEMs is proposed, where the selection of hierarchical approximation spaces is driven by a Dörfler marking strategy [15] on both the spatial and parameter domains. In both works, overestimation up to a factor of 10 of the true error is reported. In [18] a similar approach is taken, where residuals are computed using an equilibrated fluxes strategy and overestimation up to a factor of 5 is reported.
We focus on the (implicit) approach taken in [8][9][10] which is based on solving local subproblems for the error over a 'detail' space. The bound for the effectivity index of the resulting error estimators depends on a CBS constant. In [8][9][10] no insight into which choices of detail space result in a sharp error bound (effectivity indices close to one) is given. In this paper we provide that analysis. Specifically, we provide detailed information about the CBS constant and derive theoretical bounds for it, for certain choices of SGFEM solution and detail spaces. Due to the way that the spaces associated with the parameter domain are chosen, the CBS constants needed to analyse the estimators in [8][9][10] depend only on a pair of finite element spaces on the spatial domain. Hence, our results are also applicable to the design of adaptive finite element schemes for deterministic PDEs. We investigate which choices of detail FEM space result in CBS constants close to zero, to help ensure a sharp error bound. In particular, due to cost restrictions imposed to avoid high-dimensional detail spaces, we investigate non-standard choices that aren't typically considered in the deterministic setting. The error estimation strategy in [29] also relies on a CBS constant, but in a different setting. Enrichment of the finite element space is not considered.

Outline
In Sect. 2, for the benefit of readers who are not familiar with the area, we review classical results from [1,6] concerning a posteriori error estimation for Galerkin approximation. In Sect. 3 we demonstrate how those results are applied to SGFEM approximations of (1a)-(1b), leading to a simplified analysis of the error estimator introduced in [10] and the associated error bound. In particular, we show how the bound depends on a CBS constant associated with two finite element spaces H 1 and H 2 . In Sect. 4 we first remind the reader how to compute CBS constants numerically. We then study some (non-standard) pairs of H 1 and H 2 and compute the associated CBS constants. In Sect. 5 we demonstrate that if H 1 is the space of piecewise bilinear (Q 1 ) functions, theoretical estimates for the CBS constant can be obtained using a novel linear algebra approach for several choices of H 2 . Finally, in Sect. 6 we present numerical results demonstrating the quality of the aforementioned error estimator, and the vital importance of choosing the right detail spaces.

Classical a Posteriori Error Estimation
The following classical results from [1,6], along with Theorem 1, form the foundations of our main investigation in Sects. 3, 4, 5, 6. Let V be a Hilbert space with norm || · || V and let B : V × V → R and F : V → R denote a bounded and coercive bilinear form and linear functional, respectively. Consider the problem: We assume that B(·, ·) is an inner product on V with the 'energy' norm || · || B = B(·, ·) 1/2 and that (8) is uniquely solvable. We now seek a Galerkin approximation to u ∈ V . Let X be an N X -dimensional subspace of V . We then solve: Computing the error e = u − u X ∈ V is a non-trivial task. Our goal is to estimate the energy error ||e|| B . Clearly, e satisfies the problem: where u X ∈ X is the Galerkin approximation satisfying (9). Now suppose we choose a second subspace W ⊂ V of dimension N W , where X ⊂ W (i.e., W is richer than X ) and consider the following problem: By letting e W = u W − u X we see that Note that (12) is simply a restatement of (10) over W . We deduce then that the function e W ∈ W satisfying (12) estimates the true error e ∈ V satisfying (10). Whilst we do not compute u W , it is clear that the quality of that Galerkin approximation (and hence the choice of W ) determines the quality of the estimator e W . To analyse this, we require the following assumption.
Assumption 2 Let the functions u, u X and u W satisfy (8), (9) and (11) respectively. There exists a constant β ∈ [0, 1) (the saturation constant) such that In many applications, Assumption 2 is reasonable (see [1, p. 88]). The relationship between ||e|| B and ||e W || B is summarised in the next result.
The interpretation of (14) is as follows; ||e W || B will never overestimate the true error ||e|| B , but could underestimate it by a factor of (1 − β 2 ) −1/2 . Problem (12) leads to a linear system of N W equations which may be too expensive to solve. This is the case for the problem considered in Sect. 3. Suppose then that B 0 : V × V → R is an inner product with induced norm || · || B 0 = B 0 (·, ·) 1/2 , whose matrix representation on W is more convenient to work with. We may then consider the alternative problem: The next result summarises the relationship between ||e W || B and ||e 0 || B 0 .

Theorem 3 (See [1, Theorem 5.3])
Let e W ∈ W and e 0 ∈ W satisfy (12) and (15) respectively and suppose that there exist λ, Λ ∈ R + such that (the norms are equivalent) then Even after replacing B(·, ·) with B 0 (·, ·), since W ⊃ X , it may be more expensive to compute e 0 ∈ W satisfying (15) than u X ∈ X satisfying (9). To reduce the cost further, we insist that where the 'detail' space Y ⊂ V has dimension N Y and consider the lower-dimensional problem: Does ||e Y || B 0 provide a good estimate for ||e|| B ? To answer this, we require Theorem 1.
The estimates ||e Y || B 0 and ||e 0 || B 0 are then related by the following theorem.
In summary, the quality of the energy error estimate ||e Y || B 0 is determined by two constants β and γ , which both depend on X and Y . Ideally, we want 1 − β 2 1 − γ 2 ≈ 1. Given a fixed initial approximation space X , what is the best choice of detail space Y , from the point of view of obtaining the best possible error estimate? This is the essence of our investigation.

The Parametric Diffusion Problem
The variational formulation of (1a)-(1b) is: where H 1 0 (D) is the usual Hilbert space and L 2 π (Γ ) is given by To ensure that (23) is well-posed, we make the following assumption.

SGFEM Approximation
We now seek a Galerkin approximation to u ∈ V satisfying (23). As in Sect. 2, we denote by X an N X -dimensional subspace of V . Here, we exploit the tensor product structure of V and choose X : . We then look for u X ∈ X satisfying (9).
We choose to be a space of finite element functions associated with a mesh on D and P = span{ϕ i (y)} s i=1 to be a space of global (multivariate) polynomials on Γ , so that N X = ns. We choose the basis functions for P to be orthonormal with respect to ·, · L 2 π (Γ ) . To this end, we introduce the set of finitely supported multi-indices; where the families of univariate polynomials {ϕ μ m (y m ), μ m = 0, 1, 2 . . .}, for m = 1, . . . , ∞, are chosen to be orthonormal with respect to the inner product associated with π m (y m ). We also assume that ϕ 0 (y m ) = 1 so that ϕ μ (y) = μ m =0 ϕ μ m (y m ) for any μ ∈ J . Clearly, choosing the subspace P is equivalent to choosing a set of multi-indices J P ⊂ J with cardinality card(J P ) = s. To compute a Galerkin approximation u X ∈ X satisfying (9), it is essential that the sum in (26) has a finite number of nonzero terms. It is not necessary to truncate the diffusion coefficient a priori. We need only assume that P contains polynomials in which a finite number of parameters y m are 'active'. If we assume that the first M parameters are active, then, provided (2) holds, B m (u X , v) = 0 for u X , v ∈ X for all m > M (e.g., see [8]). In other words, the projection onto X = H 1 ⊗ P truncates the sum after M terms.

A Posteriori Error Estimation
Suppose we now choose a second SGFEM space (12) to obtain an estimator e W ∈ W for the error e = u − u X . If Assumption 2 holds for the chosen spaces X and W , then (14) also holds, where || · || B is the energy norm induced by the bilinear form defined in (24). In addition, due to the norm equivalence (28), the bound (17) also holds, where e 0 ∈ W satisfies (15) and || · || B 0 is the norm induced by the bilinear form defined in (27a).
There are several possible ways to construct W . Following [10], we choose with Let J Q denote the set of finitely supported multi-indices which correspond to the subspace Q. If J P ∩ J Q = ∅, then we have P ∩ Q = {0} as required. In this case, P and Q are mutually orthogonal with respect to ·, · L 2 π (Γ ) since for all μ ∈ J P and ν ∈ J Q . Furthermore, due to the tensor product structure of Y 1 and Y 2 and the fact that P ∩ Q = {0}, it can be shown that To see this, expand u ∈ Y 1 and v ∈ Y 2 in the chosen bases and use (32). Given Y , we can then compute the error estimate η := ||e Y || B 0 by solving (19) and the bound (21) holds. Combining all these results yields the result of Theorem 5. For completeness, we restate this for our parametric diffusion problem.
When Y is chosen as in (31), problem (19) (19) and considering the identity (33), we find that e Y ∈ Y satisfying (19) can be determined by solving the lowerdimensional problems This is precisely the estimator considered in [10]. In [10] however, (31) is rearranged as The analysis in that work relies on the orthogonality of P and Q, and the decoupling of (15) into two smaller problems over ((H 1 ⊕ H 2 ) ⊗ P) and (H 2 ⊗ Q). A CBS constant is introduced into the analysis by splitting the former into H 1 ⊗ P and H 2 ⊗ P. Our approach is subtly different. We introduce a CBS constant by splitting the augmented space W into X and Y , as would be done for the analogous deterministic problem (for which X = H 1 and Y = H 2 ).
If (4) holds, then H 1 0 (D) is a Hilbert space with respect to the inner product and since H 1 ∩ H 2 = {0}, by Theorem 1, there exists a γ ∈ [0, 1) such that In [8,Lemma 3.1], it is shown that the γ that features in (39) also satisfies Now consider (20).
Since P and Q are orthogonal with respect to ·, · L 2 π , Y 1 and Y 2 are orthogonal with respect to B 0 (·, ·) and so (34) can be determined by analyzing the spaces H 1 and H 2 . P and Q do not play a role. They do, of course, affect the saturation constant β, and this will be discussed in Sect. 6.

Estimated Error Reductions
The constant γ also plays an important role in adaptivity. Given u X ∈ X , how do we choose an enriched space X * ⊃ X in which to compute a new approximation u * which yields a reduced energy error? Consider the problems; where . Let e W 1 = u − u W 1 denote the error corresponding to the enhanced approximation u W 1 ∈ W 1 . Due to Galerkin orthogonality we find; Hence, ||u W 1 − u X || B characterises the energy error reduction that would be achieved by enriching only the subspace H 1 ⊂ H 1 0 (D). Similarly, ||u W 2 − u X || B characterises the energy error reduction that would be achieved by enriching only the subspace P ⊂ L 2 π (Γ ). Fortunately, the two components ||e Y 1 || B 0 and ||e Y 2 || B 0 of our error estimator provide estimates of these error reductions, see [8, Theorem 5.1].
Given H 2 and Q, Theorem 7 allows us to assess whether enrichment of H 1 (with functions in H 2 ) is more beneficial than enrichment of P (with functions in Q). We may choose X * = W 1 or X * = W 2 . Our choice is determined by which space offers the greatest estimated error reduction per additional degree of freedom. Note that the bound (44) is independent of the saturation constant β, and choosing H 2 in (31) so that the constant γ in (39) is small tightens the bound (44). That is, if the CBS constant is small, we can have more confidence in our decisions when performing adaptivity. We now study this constant for various choices of H 1 and H 2 .

Numerical Estimates of CBS Constants
The constant γ ∈ [0, 1) in (34) and (44), which is equivalent to the constant γ ∈ [0, 1) satisfying (39), is not unique. Given H 1 and H 2 , we want to find the smallest such constant, the CBS constant. We now recall a standard result from [19] which leads to a numerical method for computing the CBS constant associated with (39).
Suppose first that M ∈ R N ×N is symmetric and positive definite with N := m + n for m, n ∈ N.
which satisfies U ∩ V = {0}. When M has a particular block structure, Theorem 1 along with U and V in (46), leads to the following result (see [19] for a proof).
Corollary 1 Let M ∈ R N ×N be symmetric and positive definite with where N = m + n, B ∈ R m×m and A ∈ R n×n . There exists a constant γ ∈ [0, 1) such that Furthermore, the smallest such constant, γ min ∈ [0, 1) (the CBS constant), satisfying (48) is the square root of the largest eigenvalue θ max of the generalised eigenvalue problem We now demonstrate that the CBS constant associated with (39) for various choices of H 1 and H 2 can be computed by solving an eigenvalue problem of the form (49). Recall that . For now, we assume a 0 (x) = 1. Note that due to the symmetry of (38) we may compute the CBS constant associated with the equivalent result; there exists a γ ∈ [0, 1) such that Given , we can define the augmented subspace Then, for all u ∈ H 2 and v ∈ H 1 , u, v = u Mv for some u ∈ U and v ∈ V in (46).
Given a fixed space H 1 associated with a uniform mesh T h on the spatial domain D, we construct H 2 element-wise. That is, we insist that H 2 admits the decomposition where k denotes an element in T h . We choose the functions ψ k i to be bubble functions so that H k,2 contains functions which only have non-zero support on k . The linear system associated with the estimator e Y 1 satisfying (35) then decouples. Since e Y 1 ∈ Y 1 := H 2 ⊗ P, on each k , we have to solve a problem of size m k × dim(P). Since dim(P) may be large, to keep costs reasonable, we must restrict the dimension of H k,2 . Note that the necessity to restrict the dimension of H 2 is not as prevalent in the deterministic PDE setting, where each local problem for the analogous error estimate is of dimension m k , not m k × dim(P). The goal is to find the best space (the one leading to the tightest error bound), of a fixed small dimension. Below, we fix H 1 and consider various choices of H 2 of the same dimension. We vary the mesh size h, and estimate the CBS constant by solving the eigenvalue problem (49).  Q 1 (h)). On each k we construct a local space H k,2 of dimension m k ≤ 5, by defining bubble functions at the edge midpoints and the element centroid (the Q 1 nodes that would be introduced by a uniform mesh refinement). We consider the following options. The name given to the resulting space H 2 is shown in brackets.   h/2)) This is the same as option 3 but with Q 2 basis functions. Figure 1 displays an arbitrary internal, edge, and corner element. In Table 1 Q 2 (h)). On each k we construct a local space H k,2 of dimension m k ≤ 16, by defining a set of bubble functions associated with the additional Q 2 nodes that would be introduced by performing a uniform mesh refinement (see Fig. 2). We consider the following options. For our third and fourth choices of H 2 we modify the first two spaces by removing the basis functions associated with the nodes indicated by the grey markers in Fig. 2. This configuration is motivated by error estimation results presented in [23, p. 39]. We denote the resulting 'reduced' spaces by Q r 4 (h) and Q r 2 (h/2) and denote the number of degrees of freedom by N r .
In Table 2 we record γ 2 min for each choice of H 2 for varying h. In [10], the authors choose H 2 = Q r 4 (h) to define the error estimator η = e Y B 0 described in Sect. 3, when u X is computed with H 1 = Q 2 (h). When a 0 = 1, the CBS constant is γ min ≤ √ 0.36. Of the four spaces considered, H 2 = Q r 4 (h) yields the smallest CBS constant.

Local CBS Constants
Solving the eigenvalue problem (49) to compute the CBS constant when N is large is not practical. Alternatively, we may derive a small eigenvalue problem associated with a single where u k := u| k ∈ H k,2 := H 2 | k , and v k := v| k ∈ H k,1 := H 1 | k with H k,1 , H k,2 ⊂ H 1 ( k ) having dimensions n k = dim(H k,1 ) and m k = dim(H k,2 ) (recall that we now list functions in H 2 before functions in H 1 ). Here, | k denotes the restriction to element k . For all u k ∈ H k, 1 and v k ∈ H k,2 , a 0 u k , v k k = u k M k v k , where the matrix M k ∈ R N k ×N k for N k := n k + m k has the same 2 × 2 block structure as before with A k ∈ R n k ×n k , B k ∈ R m k ×m k and C k ∈ R n k ×m k . Since a 0 ·, · k only induces a seminorm on H 1 ( k ), M k is positive semidefinite and Corollary 1 is not applicable. For our block matrices of interest, we require the following result from [19].

Corollary 2 Let M k ∈ R N k ×N k be symmetric and positive semidefinite with
where N k = m k + n k , B k ∈ R m k ×m k is invertible and A k ∈ R n k ×n k . Let U k , V k ⊂ R N k have the same structure as U and V in (46), but for n k and m k in place of n and m. Then, there exists a constant γ k ∈ [0, 1) such that If H k,1 and H k,2 are chosen so that M k satisfies the conditions of Corollary 2, there exists a constant γ k ∈ [0, 1) such that (55) holds, or equivalently Furthermore, the local CBS constant γ k,min associated with (56) is the square root of θ max satisfying see [19]. It is straightforward to show that and comparing with (39) gives γ min ≤ sup k γ k,min . When the mesh is uniform and a 0 (x) is constant on each element, γ k,min does not depend on h or a 0 | k . To estimate γ min , we only need to compute γ k,min for a single internal element (as this is larger than the constant associated with corner/edge elements). We now revisit Example 1 and compute local CBS constants associated with ref := [−1, 1] 2 . We choose H k,1 to be the local Q 1 finite element space whose basis functions are associated with the black markers shown in Fig. 3, and ordered as shown. Then, dim(H k,1 ) = 4 and, if a 0 = 1, we have For the four choices of H k,2 considered in Example 1 (which all have dimension five), we construct the matrix M k in (54) and calculate γ 2 k,min by solving the eigenvalue problem (57). The ordering of the basis functions of H k,2 is as illustrated by the clear markers in Fig. 3.

Remark 1
If a 0 (x) varies in space, we may assume that a 0 (x) can be approximated by a function a h 0 (x) that is constant in each element in T h . Then, on each k , we have a symmetric and positive semidefinite matrix where B k is invertible and B k , C k , A k do not depend on α k . The local eigenvalue problem, equivalent to (57), is , and thus the local CBS constant γ k,min satisfying (56) is independent of α k and a h 0 (x).

Theoretical Estimates of the CBS Constant
In this section we fix H k,1 to be the local Q 1 finite element space so that A k is given by (58) and assume that the degrees of freedom are numbered as shown in Fig. 3. We also assume that H k,2 is chosen so that dim(H k,2 ) = 5 and the resulting matrices B k and C k have a particular structure. Exploiting this structure, and using only linear algebra arguments, we show that the local CBS constant γ k,min can be calculated analytically without assembling and solving (57). To simplify notation, we drop the subscript k.
for some constant β ∈ R + , then there exists a constant γ ∈ [0, 1) such that Proof It is sufficient to show that N (M) is given by the definition in (54). The result then follows from Corollary 2. Let x T = (u T 1 , v T 2 ) ∈ R 9 for u 1 ∈ R 5 and v 2 ∈ R 4 be such that Mx = 0. Then and Sv 2 = 0 for the Schur complement where β ∈ R + is the constant in (60).
Proof Recall from (57) that γ 2 min is the largest eigenvalue θ max satisfying By considering the expression Qu = 0 it is easy to show that span (1, 0, 1, 0) , (0, 1, 0, 1) , and so N (A) ⊂ N (Q). Under the stated assumptions, we have and the set of eigenvalues is {2β, 2β, 0, 0}. The basis vectors of N (Q) in (67) are eigenvectors corresponding to the zero eigenvalues. In addition, is an eigenvector corresponding to θ = 2β. To see this, note that The same is true for P 2 = [−1, 1, 1, −1] . Furthermore, the vectors P 1 and P 2 also satisfy AP 1 = P 1 and AP 2 = P 2 (and clearly do not belong to N (A), see (64)) and hence are eigenvectors of A with eigenvalue θ = 1. Thus That is, P 1 and P 2 are eigenvectors in (66) with θ = 2β. If we take u to be a member of N (Q) but not N (A), then (66) is trivially satisfied with θ = 0. Hence, γ 2 min = max{0, 2β} = 2β.
The next results show that if the matrices B and C have certain structures, then C B −1 C always has the structure (60) and an explicit expression is available for the constant β in (65), and hence the CBS constant.

Lemma 1
If the matrix C ∈ R 4×5 has the form for some α ∈ R + and if B ∈ R 5×5 is an invertible bordered matrix of the form whereB ∈ R 4×4 is a symmetric circulant matrix, b ∈ R 4 is a constant vector, and μ ∈ R is a constant, then the matrix C B −1 C has the form (60).
Proof First we show that if B has the form (69) then so does B −1 . We have SinceB is symmetric and circulant, so is its inverse (see [13]). Consequently, q :=B −1 b ∈ R 4×1 is a constant vector and qq ∈ R 4×4 is a constant matrix. This is because b is a constant vector and the row sums of a circulant matrix are equal. Therefore whereB :=B −1 + ν −1 qq ∈ R 4×4 is a symmetric circulant matrix bordered byb := −ν −1 q ∈ R 4×1 and ν ∈ R is a constant. Hence, B −1 has the form for some α 1 , α 2 , α 3 ∈ R and, for the rest of the proof, the elements marked with × are not important. Now, elementary matrix multiplication with C gives (again the elements marked with× are not important) and Combining the last two results, we see that to compute the CBS constant, we only need to know α (one entry of C) and α 1 and α 3 (two entries of the first column of B −1 ). We can determine the latter analytically by exploiting the spectral decomposition for circulant matrices. Lemma 2 is standard (for example, see [13]) and we apply it toB in (69) in Lemma 3.

Lemma 2
Let D ∈ R n×n be a circulant matrix with first column given by d = [d 0 , d 1 , . . . , d n−1 ] ∈ R n . The eigenvalues λ j and eigenvectors v j of D are where ω j = exp(2πi j/n) and i = √ −1.

Lemma 3 Let the principle minorB in (69) of the matrix B be given bȳ
Then the eigenvalues ofB are and the eigenvectors ofB are given by the columns of the unitary matrix Proof SinceB is circulant, its eigenpairs are given by (72). Here, n = 4 and we have ω 1 = i, ω 2 = −1, ω 3 = −i, and ω 4 = 1. The decomposition is standard (see [34,Corollary 5.16]).
Combining the above results, gives the following final result.
where α 1 and α 3 are elements of the matrixB in (70), which depends on the inverse ofB in (69). By Lemma 3,B SinceB −1 is circulant, its entries are known once we specify its first columnc. Furthermore, since Now, sinceB :=B −1 + ν −1 qq in Lemma 1, we know that α 1 = [c] 1 + τ and α 3 = [c] 3 + τ for some τ ∈ R, and consequently, by considering (76) and the eigenvalues (74), we have Since B is symmetric and positive definite, so isB. Consequently, λ 1 > 0 and β = The result follows by Theorem 9.    Table 3 (to stress that these are local quantities we reintroduce the subscript k). The results match the numerical estimates obtained in Sect. 4.1. With the new approach, the matrices A, B and C do not need to be assembled, and no eigenvalue problem needs to be solved.

Numerical Results
We now return to (1a)-(1b) and assess the quality of the energy error estimator η in (37), extending the discussion in [8] and [10]. First, we select X = H 1 ⊗ P and compute u X ∈ X by solving (9). We choose either H 1 = Q 1 (h) or H 1 = Q 2 (h) on a uniform square partition of D and fix P to be the space of global polynomials with total degree less than or equal to p in y 1 , y 2 , . . . , y M . Each parameter y m is assumed to be the image of a mean zero uniform random variable. Hence, for a given multi-index μ ∈ J P we construct the basis functions in (30) by tensorizing univariate Legendre polynomials. Next, we compute η = η(u X ) in (37) by solving (35) and (36), choosing H 2 and Q so that the conditions in (31) are satisfied. For H 2 , we consider the spaces from Examples 1 and 2. We choose Q to be the space of polynomials associated with J Q :=Ĵ Q \J P whereĴ Q is the set of multi-indices associated with the space of polynomials with total degree less than or equal to p + 1 in y 1 , y 2 , . . . , y M , y M+1 . By Theorem 6, the effectivity index θ eff := η(u X )/||u − u X || B satisfies

Test Problem 1
To start, we consider a test problem from [8]. We choose f (x) = 1 assume that a(x, y) is the parametric form of a second order random field with mean E[a](x) and covariance function We may then expand a(x, y) using the Karhunen-Loève expansion, namely; where (λ m , φ m ) are the eigenpairs of the covariance operator. We choose E[a](x) = 1 (the mean), σ = 0.15 (the standard deviation) and l 1 = l 2 = 2 (the correlation lengths). In In our first experiment we choose H 1 = Q 1 (h) and fix M = 5 in the definition of P. In Table 4 we recordθ eff for varying h with fixed p = 4, and varying p with fixed h = 2 −3 . We see that H 2 = Q 2 (h/2) yields the best error estimator. Interestingly, H 2 = Q 4 (h) defines the worst estimator, despite the fact that its associated CBS constant is the smallest (γ 2 min ≤ 0.0121). Recall from Theorem 7 that ||e Y 1 || B 0 and ||e Y 2 || B 0 provide estimates of the energy error reductions associated with augmenting H 1 with H 2 , and P with Q, respectively. When H 2 = Q 4 (h), since the CBS constant is small, we know ||e Y 1 || B 0 is a good estimate. When both ||e Y 1 || B 0 and ||e Y 2 || B 0 are much smaller than ||e|| B (which is true when we use the stated Q and H 2 = Q 4 (h)), the saturation constant β ≈ 1. This causes η to be much smaller than ||e|| B , resulting in a poor effectivity index.
We now repeat the experiment with H 1 = Q 2 (h). Note that for a fixed h, the spatial error associated with u X is smaller than for H 1 = Q 1 (h). Results are presented in Table 5. Now, as we vary both h (for p=4 fixed) and p (for h = 2 −3 fixed), the error ||u ref − u X || B stagnates. The estimated errors behave the same way, butθ eff is not close to one. There is little benefit in computing a new Galerkin solution by augmenting either H 1 with H 2 (for any of the choices of H 2 ) or P with Q. The saturation constant is close to one in all cases. However, if introducing more parameters into the approximation space leads to a smaller saturation constant, a better estimate of the error should be obtained by modifying Q to include more parameters.
We fix P as before with M = 5 but now choose Q to be the space of polynomials associated with J Q :=Ĵ Q \J P whereĴ Q is the set of multi-indices associated with polynomials with total degree less than or equal to p + 1 in the first M + 3 parameters. Results are presented in Table 6. The effectivity indices are much improved. It is well known that the eigenvalues λ m associated with (78) decay very slowly ( √ λ m = O(m −1 ), see [24]). To achieve a small saturation constant, and hence an accurate error estimator, a large number of parameters need to be incorporated into Q. We now study a problem with faster decaying coefficients.

Summary and Conclusions
Using classical theory from [1,6] for Galerkin approximation, we provided an alternative derivation of an error estimator from [10] and the associated bound. Our approach highlights the straightforward extension of an error estimation strategy for standard Galerkin FEMs for Through numerical experiments, we demonstrated that Q must also be carefully selected and tailored to properties of the diffusion coefficient. When both H 2 and Q are chosen appropriately, the estimator exhibits effectivity indices close to one. Choosing H 2 and Q so that the effectivity index is close to one is not the end of the story. If the estimated error associated with u X ∈ X = H 1 ⊗ P is too high, we need to decide how to enrich X and compute a new approximation. The error estimate needs to be accurate, but to derive adaptive algorithms using (44)-(45), we should only work with spaces H 2 and Q such that it is straight-forward to compute new SGFEM approximations in (H 1 ⊕ H 2 ) ⊗ P and/or H 1 ⊗ (P ⊕ Q). For example, when H 1 = Q 1 (h), choosing H 2 = Q 2 (h) yields an accurate error estimate for the current approximation, but does not give a feasible spatial adaptive enrichment strategy. Choosing H 2 = Q 1 (h/2) is more natural. Fortunately, this space also yields a good error estimator. When H 1 = Q 2 (h), H 2 = Q r 2 (h/2) yields an excellent estimator. Although using H 2 = Q 2 (h/2) is more natural for adaptivity, we recommend using H 2 = Q r 2 (h/2) to estimate the error. Not only is this cheapest option of all those considered, since Q 2 (h/2) is richer, the estimated spatial error reduction ||e Y 1 || B 0 obtained using H 2 = Q r 2 (h/2) is still informative, if we wish to assess the benefit of computing a new approximation in (Q 2 (h) ⊕ Q 2 (h/2)) ⊗ P.