A massively parallel nonoverlapping additive Schwarz method for discontinuous Galerkin discretization of elliptic problems

A second order elliptic problem with discontinuous coefficient in 2-D or 3-D is considered. The problem is discretized by a symmetric weighted interior penalty discontinuous Galerkin finite element method with nonmatching simplicial elements and piecewise linear functions. The resulting discrete problem is solved by a two-level additive Schwarz method with a relatively coarse grid and with local solves restricted to subdomains which can be as small as single element. In this way the method has a potential for a very high level of fine grained parallelism. Condition number estimate depending on the relative sizes of the underlying grids is provided. The rate of convergence of the method is independent of the jumps of the coefficient if its variation is moderate inside coarse grid substructures or on local solvers’ subdomain boundaries. Numerical experiments are reported which confirm theoretical results.


Introduction
In this paper we consider a second order elliptic equation with homogeneous Dirichlet boundary condition, where the diffusion coefficient ρ is a discontinuous function. The problem is discretized by a symmetric weighted interior penalty discontinuous Galerkin (DG) finite element method with simplicial nonmatching elements and piecewise linear functions. Our goal is to design and analyze a two-level nonoverlapping additive Schwarz method (ASM), see e.g. [16], for solving the resulting discrete problem with rate of convergence independent of the jumps of the coefficient and with a potential for a very high level of fine grained parallelism.
Usually, two level ASMs for discretizations on fine mesh of size h are defined by introducing a partitioning of the domain into subdomains of size H > h, where local solvers are applied in parallel. A global coarse problem is then based on the same partitioning. This method has recently been generalized for nonoverlapping domain decomposition methods for DG discretizations, allowing the coarse grid with mesh size H ≤ H to be a refinement of the original partitioning into subdomains where local solvers are applied-see e.g. [1,14].
In the present paper, we take a different approach and consider local solvers' subdomains of size H and a coarse grid of size H such that H ≥ H . Therefore, in an extreme case our local solver subdomain can be as small as a single element of the fine mesh (so then H = h); on the other extreme, it can be set equal to the coarse grid cell, H = H, the usual approach. By allowing small subdomains we substantially increase the level of parallelism of the method. Very small and cheap to solve local systems come in huge quantities, which can be an advantage on new multi-threaded processors. Moreover, small subdomains give more flexibility in assigning them to processors for load balancing in coarse grain parallel processing. In this way, an additional level of domain partitioning gives the user more parameters to fine tune the actual parallel performance, and thus overall efficiency, of the preconditioner for a given hardware architecture. The increased level of parallelism affects the condition number of the preconditioned system in a controlled way which we prove is, roughly speaking, of order O H 2 /h H . Numerical experiments reported in this paper confirm the bound is sharp in terms of mesh sizes.
In particular, if H = h we get the condition number of order O(H 2 / h 2 ) which then gradually improves, as H increases, to well-known O(H/ h) when H = H, see [14]. It is also not surprising that if all partitionings scale in the same way so that H/H and H/ h are kept constant, then the condition number remains bounded independently of the actual values of H, H, h.
The method discussed in the paper makes the preconditioned system condition number independent of the jumps of diffusion coefficient ρ, under the assumption that the distribution of the coefficient satisfies certain assumptions. We identify two such cases: when the variation of the coefficient is mild inside cells of the coarse grid, or when the coefficient is close to a constant on the skeleton of the partitioning into local solvers' subdomains.
The ASM discussed here is a generalization of [11] and, for continuous coefficient, of [1,14]. See also [3] for a similar concept for continuous FE discretization where functions of the coarse space were constant inside substructures on which the original region was partitioned. The authors are unaware of other results concerning the influence of the sizes of local solver subdomains and the coarse grid on the condition number when the former grid is a refinement of the latter, in the context of DG methods for the discretization of elliptic problems with discontinuous coefficient. Other recent works towards domain decomposition preconditioning of DG discretizations of problems with strongly varying coefficients include [2,5,6].
The paper is organized as follows. In Sect. 2, differential and discrete DG problems are formulated. In Sect. 3, ASM for solving the discrete problem is designed and analyzed. Numerical experiments are presented in Sect. 4.
For nonnegative scalars x, y, we shall write x y if there exists a positive constant C, independent of x, y and mesh parameters h, H, H, and of jumps of the diffusion coefficient ρ as well, such that x ≤ Cy. If both x y and y x, we shall write x y.

Differential and discrete DG problems
Let be a bounded open polyhedral domain in R d , d ∈ {2, 3}, with Lipschitz boundary ∂ . We consider the following variational problem for given f ∈ L 2 ( ) and ρ ∈ L ∞ ( ): where We assume that there exist constants α 0 and α 1 such that 0 < α 0 ≤ ρ ≤ α 1 a.e. in so that (1) is well-posed. Without loss of generality we shall additionally suppose that α 0 ≥ 1, which can always be guaranteed by simple scaling. This assumption will simplify the proof in Sect. 3.
In what follows we will analyze a preconditioner for a system of algebraic equations arising from a discretization of (1) with DG finite element method. The corresponding finite element spaces and the discrete problem are introduced below in the following subsection.

Finite element spaces and DG discretization
Let T h denote an affine, shape-regular but not necessarily conforming partition of by triangles in 2-D or tetrahedrons in 3-D, T h = {K 1 , . . . , K N h }. We shall refer to T h as the "fine mesh". The diameter of element K ∈ T h will be denoted by h K (assumed uniformly bounded from above by an absolute constant) and we collect all elements' diameters in a multi-parameter h = (h K 1 , . . . , h K N h ).
By E 0 h we denote the set of all common (internal) faces (edges in 2-D) of elements in T h , so that e ∈ E 0 h iff e = ∂ K i ∩ ∂ K j is of positive measure. We will use symbol E h to denote the set of all faces (edges in 2-D) of fine mesh T h , that is those either in E 0 h or on the boundary ∂ . For e ∈ E h we set h e = diam(e).
We assume that T h is shape-and contact-regular in the sense of [9, Definition 1.38], that is, it admits a matching submesh Th which is shape-regular and such that for any K ∈ T h the ratios of h K to diameters of simplices in Th covering K are uniformly bounded by an absolute constant. In consequence, if e = ∂ K i ∩ ∂ K j ∈ E 0 h , then Moreover, the number of neighboring elements in T h is bounded by an absolute constant.
For p ∈ {0, 1}, we denote by P p the set of polynomials of degree not greater than p. Then we define the finite element space V h in which we will approximate (1), Note that traces of functions from V h are multi-valued on the skeleton E h . We discretize (1) by the symmetric weighted interior penalty DG method, see for example [10,13]: where Here for K ∈ T h and e ∈ E h we use standard notation In what follows we shall assume that ρ is piecewise constant on T h (equal to its average value on each element), thus with ω j = ρ j /(ρ i + ρ j ). The unit normal vector pointing outward K i is denoted by n i . On e which lies on the boundary of and belongs to a face of K i , we set {ρ∇u} ω = ρ i ∇u i , [u] = u i n i and γ = δρ i / h e . For sufficiently large δ the discrete problem (4) is well-defined, according to the following lemma: Lemma 1 ( [10,13]) There exists positive δ 0 such that if δ ≥ δ 0 the bilinear form A h (·, ·) is symmetric, positive definite, and spectrally equivalent to a simplified form In what follows we shall take as granted that δ is a fixed parameter such that δ ≥ δ 0 . The seminorm of a function f from the Sobolev space H s (U ) will be denoted by | f | H s (U ) . For short, the L 2 -norm of f will then be denoted by | f | 0,U or simply | f | U . We also define the broken norm ||u|| 1,h by the identity ||u|| 2 1,h = A h (u, u). Then there holds the following approximation result: 10,13]) Let u * ∈ H 1 0 ( ) be the solution of (1), and let u * h ∈ V h satisfy the discrete problem (4).

Additive Schwarz method with small subdomains
The condition number of the discrete problem (4) can be prohibitively large, affected by both the fine mesh size and by the magnitude of jumps in ρ. Thus, for an iterative solution of (4), some preconditioning is necessary. In this section we consider an ASM whose ingredients are: a large number of local solvers on relatively small nonoverlapping subdomains, and a coarse grid solver defined on a possibly much coarser grid. Let us therefore introduce T H as a partition of into N H disjoint open polyhedral subdomains i , i = 1, . . . , N H , such that¯ = i=1,...,N H¯ i and that each i is a union of certain elements from the fine mesh T h , which we will denote We shall refer to this partition as the "local solvers' grid", because our ASM is going to solve (in parallel) local problems defined on the fine grid restricted to each of these subdomains.
Next, let T H be a division of into N H disjoint open polyhedral regions D n , n = 1, . . . , N H , such that¯ = n=1,...,N HD n and that each D n is a union of certain elements from the local solvers' grid T H . We set H n = diam(D n ) and then H = (H 1 , . . . , H N H ). We shall call this partition the "coarse grid".
We clearly have N H ≤ N H ≤ N h and (inclusions understood in the sense of subsequent refinements of the coarsest partitioning), and max h ≤ max H ≤ max H. Note that in [1,14] the inclusions come in a An example partitioning of into fine, coarse and local solvers' grids is shown in Fig. 1.
For S belonging either to T H or to T H let us introduce the notation referring to the skeleton of the fine mesh restricted only to S: and its boundary and interior parts as well: Finally, let us collect all fine grid faces belonging to the boundaries of elements of T H and T H , respectively, to denote the skeletons of T H and T H : We start the definition of the ASM by introducing a decomposition of V h : where the coarse space is Note that V h is a direct sum of the local spaces. By choosing the lowest order coarse space, we reduce its dimension and, in consequence, the cost of solving the coarse problem.
Since now on, for ϕ defined on , if necessary we shall write ϕ i to denote the restriction of ϕ to i : Observe that this brings a change from the notation of Sect. 2.1.
Using decomposition (6) we define local operators so that on each subdomain one has to solve only a relatively small system of linear Note that on V 0 , the approximate form A h (·, ·) coincides with A h (·, ·) and simplifies to Finally, the preconditioned operator is In order to formulate the condition number estimate result for T , we make two additional assumptions, that the elements of partition T H do not differ too much in shape, as well as those in T H : Theorem 1 In addition to our general assumptions of Sect. 2, suppose that both (A1) and (A2) hold. Then T defined in (8) is symmetric with respect to A h (·, ·) and and coefficient-dependent parameters defined as Therefore, the condition number of the preconditioned operator T is O(β). Before proceeding with the proof of Theorem 1, we state a useful trace inequality result for functions in V h (see also [14], where the estimate of Lemma 3 is proved for a star-like region i ; here we do not need this assumption): Proof Let us fix i = 1, . . . N H and let us denote by Tĥ the fine mesh T h ( i ) affinely mapped ontoˆ . It suffices to prove that for v ∈ Vh, on the reference domainˆ , where for short we denote and then apply the standard scaling argument. By (A2) and the assumptions on T h of Sect. 2.1, Tĥ is also shape-and contact-regular, with relevant constants independent of h, H, H. Thus, Tĥ admits a shape-regular matching submesh Th, so it is sufficient to prove (9) on Th.
Our proof will use technical tools and results from [4]. Let us consider two subspaces of Vh: the space V 1 h of nonconforming P 1 finite elements [7] on Th, andWh-the continuous Lagrange finite element space consisting of piecewise polynomials from P 2 (in 2-D) or P 3 (in 3-D). Let where V( p) is the set of elements in Th sharing node p, and n p is the number of these elements. We will use the following estimates from [4, Eq.   (11) for all v ∈Wh.
For v ∈ Vh we have By trace inequality and the last inequality of (10), Using the last estimate of (11) and then the first from (10) we get Finally, as E(Iv) ∈ H 1 (ˆ ), we can apply standard trace inequality and use (11)  Substituting the above inequalities into (12) we arrive at (9).
and the spectral radius of = ( i j ) N H i, j=1 is bounded by N -the maximum number of neighbors of any element in T H . According to assumption (A2) N is bounded independently of the mesh parameters h and H (and, clearly, independently of other parameters of the problem).
Local stability For all i = 0, 1, . . . , N H with absolute constant ω independent of ρ, h, H and H. This is an obvious consequence of (5); in addition, ω ≥ 1, because on V 0 both forms coincide.

Stable decomposition
We have to prove that there exist C 0 independent of h, H, H, ρ, and a decomposition of u ∈ V h , such that In order to construct the stable decomposition of u ∈ V h , we first define u (0) ∈ V 0 such that on each D n ∈ T H , n = 1, . . . , N H , Note that u (0) is constant on the local solvers' subdomains i , i = 1, . . . , N H , as well.
Next, for i = 1, . . . , N H , we define u (i) ∈ V i , as a zero-extension of the restriction of u − u (0) to i : We obviously have, since u (0) is subdomainwise constant and its jumps cannot occur inside any element of T H , Let us consider subdomain i . Then, it follows that on e ∈ E ∂ h ( i ) such that e = ∂ K I ∩ ∂ K J and K I ∈ T h ( i ), and analogously, if e lies on the boundary of , then From (A2) it follows that we can apply the trace inequality of Lemma 3 to u − u (0) i and obtain Applying Poincaré's inequality for discontinuous finite element functions [4, Theorem 5.1] (see also [8,14]) on the matching submesh of D n and then using the shape-and contact-regularity assumption followed by the scaling argument we get and therefore From the above we can estimate the latter inequality following from Lemma 1. It remains to bound A h (u (0) , u (0) ). We have For any e ∈ E H such that e = ∂ K I ∩ ∂ K J and K I ∈ T h ( i ) and Since the first term has already been estimated by β A h (u, u), we conclude that also which completes the proof.

Remark 1 In (A1) or (A2) we could have assumed the existence of a finite set of several such reference structures, the number of which is another absolute constant independent of h, H , H and the jumps in ρ.
Below we prove that the convergence speed of the method is independent of the magnitude of jumps of ρ, if the variation of coefficient ρ is assumed moderate: -either on the skeleton of the local solvers' partition E H , -or on the coarse partition T H .
Observe that then large jumps of ρ are allowed, respectively, either on islands strictly inside local subdomains or on the boundaries of the coarse partition.
Corollary 1 In addition to the assumptions of Theorem 1 suppose that the variation of the coefficient ρ on the skeleton E H is uniformly bounded by an absolute constant of order 1: Then Proof Since ρ is normalized so that ρ ≥ 1 it follows that 1 ≤ r i and 1 ≤ R n .
so in consequence max n R n 1.

Corollary 2
In addition to the assumptions of Theorem 1 suppose that the variation of the coefficient ρ is uniformly bounded by an absolute constant within elements of the coarse partition T H : Proof By our assumption, for any i = 1, . . . , N H and for n such that i ⊂ D n we have so that r i /r i 1, and for any n = 1, . . . , N H there holds R n , R n , ρ (n) as well.
Remark 2 If we suppose that (15) or (16) is satisfied and that all meshes T h , T H and T H are quasi-uniform, then (with a little abuse of notation) the expression for cond(T ) simplifies to Proof Assumption (A2) is automatically satisfied since i reduces to single element in T h , so the estimate follows directly from Corollary 2.
Theorem 1 also recovers previously known bounds, for constant or piecewise constant coefficient, in the "standard" case when T H = T H (see [1,10,14]), and extends them to the case of coefficient mildly varying over E H :

Corollary 4
In addition to our general assumptions of Sect. 2, assume that T H = T H and (A2) holds together with either (15) or (16). Then Proof The proof follows straightforwardly from Corollaries 1 and 2 and thus is omitted.
For completeness, let us also mention the following: Proof Assumption (16) is automatically fulfilled on a single element, while (A1) and (A2) are trivially satisfied because of our assumptions on the fine grid in Sect. 2. The estimate then follows again from Corollary 2.
Although the condition number in Corollary 5 is independent both from h and the jumps of ρ, the preconditioned operator T is not robust in this particular case: the number of degrees of freedom in the coarse space operator T 0 is unacceptably large, of order N h , i.e. of the same order as the original size of the discrete problem. On the other hand, T 0 is defined only on the space of piecewise constant functions. In the following tables we report the number of Preconditioned Conjugate Gradient iterations for operator T which are required to reduce the initial Euclidean norm of the residual by a factor of 10 6 and (in parentheses) the condition number of T approximation computed from the extreme eigenvalues estimate based on the PCG coefficients, see e.g. [15,Sect. 6.7.3]. We always choose a random vector for the right hand side and zero as the initial guess.

Dependence on h, H, H
First, let us consider the convergence of the "massively parallel" method when T H = T h against "standard" approach, when T H = T H . For the diffusion coefficient we take an elementwise P 0 approximation of a continuous function ρ(x) = x 2 1 + x 2 2 + 1. As it turns out from Table 1, the condition number of the method considered in Corollary 3

Strongly discontinuous coefficient piecewise constant on T H
Here we deal with large discontinuities in ρ under restrictions of Corollary 3. The case of discontinuous coefficient when T H = T H (cf. Corollary 4) has already been treated elsewhere [12], so we restrict ourselves only to the case when T H = T h . Let us then consider ρ with discontinuities aligned with an auxiliary partitioning of into 4 × 4 squares. Precisely, we introduce a red-black checkerboard coloring of this partitioning and set ρ equal to ρ R = 1 in red regions, and the value of ρ B reported in Table 3 in black ones. In this way, our fine and coarse triangulations, with m = 7 and M = 4, are always aligned with the discontinuities, as required in Corollary 3. Table 3 confirms the condition number is independent from ρ B in this case.

Strongly discontinuous coefficient piecewise constant on T h
Next, we investigate elementwise discontinuous coefficient with m = 7 and M = 4 as before, but with ρ = 1 on odd and ρ = ρ B on even-numbered triangles of T h . Table 4 shows that in this case the preconditioner fails (a dash means the method did not converge in 600 iterations) for large jumps in ρ. This confirms the importance of controlled variation of the coefficient inside elements over the partition of the domain. As stated in Corollary 5, in the unfeasible case when T h = T H = T H , the condition number of the preconditioned system is O(1) regardless of h and the jumps of ρ (parameters H and H do not apply here) and this is confirmed in experiments, as reported in Table 5.

Strongly discontinuous coefficient constant on E H
In Corollary 1 we show that in some circumstances one can allow very large variation of ρ even inside local solvers' subdomains. Let us then choose ρ constant equal to 1, except for square islands, where we set ρ = 10 6 . The islands are centrally located in cells of a 2 5 × 2 5 independent division of into squares. The size of every island equals half-length of the cell (see Fig. 2 for an example). In order to make the fine triangulation resolve the discontinuities of ρ well, we restrict ourselves only to m = 8.
As it turns out from Table 6, if local solvers' subdomains entirely cover the islands-that is, if M ≤ 5-then ρ turns constant on E H and the performance of Table 6 Dependence of the number of iterations and the condition number (in parentheses) on H = 2 −M and H = 2 −M when the coefficient is discontinuous with contrast ratio 10 6 on small islands centrally placed on a 2 5 × 2 5 grid, cf. the preconditioner is not sensitive to large jumps in the coefficients inside subdomains, which is in agreement with Corollary 1. Moreover, in the case when there are jumps across E H and at the same time the coarse grid does not resolve the discontinuities well enough, i.e. M > 5 > M, the condition number becomes sensitive to the coefficient jumps. When M ≥ 5, we are back to the case when the coefficient is constant on T H (i.e. the case of Corollary 2) and results from Table 6 reflect this.

Conclusions
A nonoverlapping ASM preconditioner for symmetric interior penalty DG discretization of second order elliptic PDE with discontinuous coefficient on a nonmatching mesh T h has been presented. A very large number of small local problems defined by subdomains introduced by a domain partition T H is solved in parallel, together with one coarse problem of moderate size on a coarser partition T H such that T H ⊆ T H ⊆ T h in the sense of subsequent refinements. Two cases of arrangement of the coefficient discontinuity relative to partitionings T H or T H have been identified, for which the condition number of the resulting system is O(H 2 / h H) independently of the jumps of the coefficient. This property allows one to design a massively parallel method with rate of convergence independent of the jumps of the coefficient. Our result also shows that the decrease of local solvers' subdomains size H (i.e. the increase of parallelism) affects the condition number only linearly, while it turns out highly important to keep the diameter H of the coarse grid as small as possible, due to its quadratic influence on the condition number.