A Stable Domain Decomposition Technique for Advection–Diffusion Problems

The use of implicit methods for numerical time integration typically generates very large systems of equations, often too large to fit in memory. To address this it is necessary to investigate ways to reduce the sizes of the involved linear systems. We describe a domain decomposition approach for the advection–diffusion equation, based on the Summation-by-Parts technique in both time and space. The domain is partitioned into non-overlapping subdomains. A linear system consisting only of interface components is isolated by solving independent subdomain-sized problems. The full solution is then computed in terms of the interface components. The Summation-by-Parts technique provides a solid theoretical framework in which we can mimic the continuous energy method, allowing us to prove both stability and invertibility of the scheme. In a numerical study we show that single-domain implementations of Summation-by-Parts based time integration can be improved upon significantly. Using our proposed method we are able to compute solutions for grid resolutions that cannot be handled efficiently using a single-domain formulation. An order of magnitude speed-up is observed, both compared to a single-domain formulation and to explicit Runge–Kutta time integration.


Introduction
The most common domain decomposition procedures involve formulating a general class of partial differential equations on a single domain, followed by an equivalent multidomain formulation. The multidomain formulation is used to construct an iterative scheme which can be employed using different discretization methods. Notable contributors using this technique B Oskar Ålund oskar.alund@liu.se include [14] and [11]. Various approaches exist depending on if the subdomains overlap [3] or not [6], but as a rule, the methods are iterative. There are exceptions, such as the finite difference based domain decomposition algorithm in [2], and the explicit-implicit domain decomposition methods in [15].
Our approach is similar to the one used in [2] in the following ways: It is non-iterative and uses non-overlapping subdomains. Subdomain intercommunication is limited to the problem of computing interface components, whence interior and boundary points may be computed in parallel. Key differences arise in the treatment of boundary and initial conditions, which in our schemes is done weakly through penalty terms. Also, our time integration is fully implicit, whereas [2] uses explicit time-stepping for the interface components.
Since our schemes are formulated in terms of general discrete differential operators known as Summation-by-Parts (SBP) operators, we gain most of the convenience commonly associated with them. For example, it is trivial to adjust the order of the derivative approximations in our schemes by simply switching the operators. Furthermore, the theoretical properties of SBP operators-augmented with Simultaneous Approximation Terms (SATs) for weakly enforcing boundary conditions-provide a general and straightforward way to prove stability for a multitude of discretized problems by mimicking continuous energy estimates [1,9].
Traditionally, the SBP-SAT technique has been used in space to formulate high-order semidiscrete schemes. Such schemes typically generate a linear system of ordinary differential equations, which is integrated in time using explicit methods. The groundwork for employing SBP-SAT also as a method of time integration was laid in [10]. However, naive usage of SBP in time produces schemes that, while provably stable and high order accurate, lead to large systems which are difficult to solve efficiently in multiple dimensions. For a comprehensive review of the SBP-SAT technique, see [13].
This article is an initial attempt at combining SBP in time and domain decomposition in order to address this efficiency problem. We consider provably stable SBP based domain decomposition methods for a two-dimensional advection-diffusion problem, where local solutions are coupled at the subdomain interfaces using SATs. The coupling procedure follows the ideas in [1], with adjustments to account for the use of SBP in time [7,10]. Our scheme involves isolating a linear system consisting only of interface components by solving independent, subdomain sized systems. This allows us to solve for the interface components separately, which are used to build the full solution.
The scheme is proved stable using established SBP-SAT procedures. By using the spectral properties of the temporal and spatial operators we are also able prove that our scheme is invertible, ensuring unique and convergent solutions.
In Sect. 2 we outline the continuous problem to be solved. Section 3 introduces SBP operators in multiple dimensions. These are used in Sects. 4 and 5 to formulate stable discretizations of the continuous problem. A system reduction algorithm based on the construction of an interface system is described in Sect. 6. The procedure is further justified by proofs of invertibility in Sect. 7. Section 8 contains an outline of the method for an arbitrary number of subdomains, together with an example illustrating how the system sizes shrink compared to a single domain scheme. A convergence and efficiency study based on a Matlab implementation of the scheme is presented in Sect. 9. Section 10 contains a brief summary of our work and future research directions.

The Advection-Diffusion Problem
Let a = (a 1 , a 2 ) ∈ R 2 , > 0, T > 0 and consider the following advection-diffusion problem on a rectangular domain Ω: Here n is the outward pointing unit normal of ∂Ω; Γ in = ∂Ω ∩ {n · a < 0} is the inflow part of the boundary, and Γ out = ∂Ω ∩ {n · a ≥ 0} is the outflow part of the boundary (see Fig. 1).
It can be shown that this problem is well-posed. Furthermore, the single domain problem (1) is equivalent to the following multidomain (see Fig. 2 In the coming sections we will propose a stable and parallelizable discretization of problem (2) based on the SBP-SAT technique. an equidistant discretization x k = k N x of the unit interval, an SBP operator is a matrix D x such that if we form the vectors u and u x by evaluating u and u x at the grid points, then D x u ≈ u x . Furthermore, there is a diagonal, positive definite matrix P x such that if u, v are smooth functions, then Equation (3) is a discrete analogue of the integration by parts formula SBP operators are classified by their order of accuracy. An SBP operator D x is called an SBP( p, r ) operator if its order of accuracy is p in the interior and r at the boundary. For details, see [13].
Next we consider smooth, time-dependent functions of two spatial variables. Let Ω = [0, 1] × [0, 1] and u : [0, T ] × Ω → R be such a function. The temporal and spatial intervals are discretized using equidistant grids t i = i T/N t , x j = j/N x , y k = k/N y , and we define the three-dimensional field U = (u i jk ), where u i jk = u(t i , x j , y k ). Furthermore, let D t , D x and D y be SBP operators in each dimension. To be able to operate on dimensions separately using matrix-vector multiplications, we form the vector Furthermore we define the discrete partial differential operators Here I t , I x , and I y are identity matrices of sizes corresponding to the discretization. With this structure it follows that The quadrature matrices P t , P x , and P y can be combined to form various integral approximations. Three types of integration are of particular importance: Integration over the entire domain [0, T ] × Ω; integration over the spatial domain at particular times; and integration at the spatial boundaries during all times. The quadrature matrix for integration over the entire domain is defined as P = P t ⊗ P x ⊗ P y and u, v P := u Pv ≈ To integrate over the spatial domain at a particular time t i we let E i t = e i e i , where e i is the standard basis vector in R N t +1 (this matrix has a 1 at position (i, i) and zeros everywhere else). The quadrature matrix for integration over the spatial domain at time t i is defined as P i Ω = E i t ⊗ P x ⊗ P y , and we have We define quadrature matrices for the spatial boundaries in a similar manner. Let E j x = e j e j , E k y = e k e k , and The remaining boundary quadratures are defined analogously, with the subscripts w, e, s, n denoting the west, east, south and north boundary respectively (see Fig. 3). The SBP property is inherited from the one-dimensional operators in the sense that where the corresponding identities in the continuous setting are

Single Domain Discretization
It is natural to first discuss a single domain scheme, since the handling of the boundary and initial conditions will carry over to the two-domain case. We restrict problem (1) to the unit square Ω = [0, 1] 2 and discretize it. A bound on the energy of the solution to problem (1) with homogenous boundary data can be found by using the energy method: We multiply the differential equation by 2u and integrate over the spatial domain. Integration by parts then yields ∂ ∂t The energy rate above is controlled by the boundary conditions. Indeed, by inserting the homogeneous boundary conditions we get Integrating in time results in a bound in terms of initial data, The bound (5) implies that Our goal is now to construct a discrete scheme which controls indefinite terms in a similar manner. Consider first the linear system If u solves problem (1), then (6) holds approximately. What we want, however, is the converse-a system with a unique solution u which approximates the solution to (1). In order to achieve this we introduce penalty terms to the right-hand side which impose the boundary and initial conditions weakly. The necessary form of these penalty terms can be derived by studying the terms that arise from applying the discrete energy method to the left-hand side in (6). That is, we perform the discrete analogue of multiplying the differential equation by 2u and integrating in space and time-we multiply (6) by 2u P: 2 u, D t u P + 2a 1 u, D x u P + 2a 2 u, D y u P − 2 ( u, D 2 x u P + u, D 2 y u P ) = 0 . Each term can be rewritten using the summation by parts properties (4): The terms in the equations above end up being dissipative depending on the signs of a 1 and a 2 . Assume for simplicity that a 1 , a 2 > 0. By combining (8)- (11), each of the four boundaries will have an associated indefinite boundary term that we must control with corresponding penalty terms. The indefinite terms are The corresponding penalty terms are constructed such that they approximate the boundary conditions (and hence are sufficiently small), and when multiplied by 2u P, they eliminate the indefinite terms above. We choose Here g w , h e , g s and h n is the data injected into the boundary grid points. Note that when multiplied by 2u P, the penalty terms turn into boundary integrals.
For example, with homogeneous boundary data, the east penalty becomes and eliminate the east indefinite term. Finally we must enforce the initial condition u t=0 = f . This is done with the penalty term which will control the term u 2 P 0 Ω from Eq. (7). Multiplying (12) by 2u P, adding u 2 P 0 Ω and completing the square yields By defining a discrete gradient and Laplacian we can write the full scheme in a complete, but compact form. Let With this notation, the scheme can be written as It can be shown that the scheme above yields the following bound when using homogeneous boundary data That is, the energy of the numerical solution at the final time and the integral of the gradients is bounded by the energy of the initial data. We have therefore proved the following proposition. (13) is stable. (14) is a discrete analogue of the continuous bound (5).

Remark 2
The scheme (13) is implicit in time. It is generally not prudent to solve up to time T using a single system (it will be too large)-instead we solve for a small number of time points successively, until we reach T . The initial data for each time-slab is given by the last solution from the previous time-slab. The procedure (multi-block in time) is explained in detail in [7].

Two-Domain Discretization
We partition the domain Ω = [0, 2] × [0, 1] into a left subdomain Ω L and a right subdomain Ω R , each discretized by a uniform grid (see Fig. 4). We associate to Ω L a numerical solution u and to Ω R a numerical solution v. The problem (2) is for the most part discretized as in the previous section. The imposition of the boundary and initial conditions is handled as in (13), and the conditions u = v and ∂u/∂n = ∂v/∂n at the interface are imposed in a similar weak manner. The combined scheme can be written Here the "External Penalties" terms contain the penalty terms described in the previous section to enforce the boundary and initial conditions. The matrices E we and E ew are reordering operators, moving components between the west and east boundary. This is necessary because the interface components of u (i.e. the east boundary of Ω L ) do not appear in the same positions as the interface components of v (i.e. the west boundary of Ω R ). More precisely: Let e j be the standard basis in R N x +1 and E N x 0 Note that because the subdomains Ω L and Ω R are discretized with uniform grids and the same number of grid points, we can use the same SBP operators and quadratures on both subdomains. This is done only for simplicity, and is not necessary. To simplify the stability proof of Proposition 3 the quadratures used for integration at the interface should match.

Remark 3
The simplifying requirement that the grid points must match at the interface can be relaxed by using so called SBP preserving interpolation operators, see [8].
The right-hand side in (15) is a stable coupling for appropriate choices of σ I L , σ V L , σ I R , σ V R . Indeed, by arguments similar to those in [1], the following proposition can be proved.

Proposition 3 A positive number ξ exists such that if
then the scheme (15) is stable.
This is proved in [1] for the semi-discrete case, and is here extended to the fully discrete case. The proof is given in "Appendix A".

System Reduction
Our goal is now to split the problem (15) into two smaller, independent subproblems. To this end we rewrite (15) as Here L = σ I L P −1 P e E we +σ V L P −1 P e E we D x and R = σ I R P −1 P w E ew +σ V R P −1 P w E ew D x . The matrix A L is the sum of all the factors in front of u (including the external penalty terms) in the top equation of (15), and the matrix A R is the sum of all the factors in front of v in the bottom equation of (15). The right-hand side b L and b R contains data from the initial and boundary conditions. Parallelization opportunities can now be illuminated by a few key observations. First, the full system (15) is equivalent to This formulation has the desirable property of being easily reduced to involve only interface components, as we shall soon demonstrate. Three major questions must be answered. Is it possible to construct (18), i.e. are the involved matrices invertible? How do we solve for the interface components? And how do we build the full solution once the interface components are known? Let us discuss these questions in order. To build the system (18) we must compute the products A −1 L L and A −1 R R (the invertibility of A L and A R is shown in Sect. 7). Due to the sparse structure of L and R this is equivalent to solving 2k N t N y independent linear systems of subdomain size. The parameter k depends on the size of the boundary stencil of the SBP operator (using a standard second order operator implies k = 2, fourth order implies k = 4 etc.). The right-hand side in (18) is in general time dependent and must be computed for each implicit time-integration step.
Once the matrix on the left-hand side of (18) and the vector on the right-hand side is known we can reduce the system to involve only interface components. Let us study a small example to illustrate the procedure. The left-hand side matrix in (18) is a block matrix with identities on the diagonal. The off-diagonal blocks have nonzero columns in positions corresponding to the interface components. This will allow us to eliminate rows and columns corresponding to non-interface components. In the example below we can think of u 0 and v 2 as non-interface components, and u 1 , u 2 , v 0 , v 1 as interface components to be isolated from the full system. The reduction is schematically depicted below. ⎡ Finally, once (21) has been solved for the interface components, the remaining unknowns u 0 and v 2 can be computed by using the top and bottom rows in (19). While the system above is too small to be the result of a real discretization, both its structure and the elimination procedure is analogous in realistic cases.
In conclusion, the following solution algorithm for (15) can be set up by precomputing the products A −1 L L and A −1 R R : Build and solve the interface system. 3. Build the full solution from the interface components. 4. Repeat using previous solution as initial data until time T is reached.
Remark 4 This type of algorithm is beneficial also for a sequential code because of the nonlinear growth of the memory cost for solving linear systems. That is, solving a single-domain discretization with a given grid size may be impossible due to memory constraints, while the multidomain scheme is solvable due to the reduced system sizes.
Remark 5 For linear problems with time-independent coefficients, the matrices A L and A R can be LU-decomposed ahead of simulation to increase efficiency. For non-linear problems, or problems with time dependent coefficients, the matrices A L and A R will change with time, and must be LU-decomposed in each time slab.

Invertibility
The procedure described in the previous section requires that the subsystems are invertible, and that the full system (17) has a unique solution. To prove that this is the case we study the spectral properties of the spatial discretization and the temporal discretization separately. The overarching idea is to 1. Establish that the discrete temporal and spatial differential operators commute. 2. Show that the eigenvalues of the temporal operator have strictly positive real parts. 3. Show that the eigenvalues of the spatial operator have non-negative real parts. 4. Note that the eigenvalues of the sum of commuting operators are the sums of their respective eigenvalues and conclude therefore that zero is not an eigenvalue of the full operator, implying that it is invertible.
Point 2 is proven for the second order case, but remains an open question for high order operators. However, extensive numerical studies corroborate this hypothesis (see [7]). We consider Point 2 in the form of a conjecture.

Conjecture 1 The eigenvalues of the matrixD
Point 1 is a simple consequence of properties of the Kronecker product. The discrete differential operator A L in (17) is the sum of a temporal partD t ⊗ I x ⊗ I y and a spatial part I t ⊗D x ⊗ I y + I t ⊗ I x ⊗D y (here the tilde symbols indicate the sum of an SBP operator and penalty terms). Clearly the temporal part commutes with the spatial part. The same is true for A R .
Next we study the spectrum of A L and A R by considering the semi-discrete version of (17). Here A L and A R are the same as A L and A R , sans the temporal discretization-i.e. A L =D x ⊗ I y + I x ⊗D y and A R =D x ⊗ I y + I x ⊗D y (here again the tilde and hat symbols indicate that penalty terms have been added to the SBP operators). Similarly Σ L and Σ R are as in Sect. 6, sans the temporal discretization. The semi-discrete system (22) is stable under the conditions of Proposition 3 (the proof is essentially the same). This allows us to prove the following lemma.

Lemma 1 Under the stability conditions of Proposition 3, the eigenvalues of the matrices A L and A R have non-negative real parts.
Proof We have established that the semidiscrete system (22) is stable, or, equivalently, with P = P x ⊗ P y , for all u, v. It follows that P A L + A L P and P A R + A R P are positive semi-definite. To see this, assume that P A L + A L P is not positive semi-definite. Then there is a vectorũ such thatũ (P A L + A L P)ũ < 0. But this contradicts (23) if we set u =ũ and v = 0. Hence P A L + A L P is positive semi-definite, and by a similar argument, so is P A R + A R P. Furthermore, if (λ, z) is an eigenpair of A L , then By adding the conjugate transpose of the rightmost equation above it follows that Hence the eigenvalues of A L (and A R ) have non-negative real parts.
Using Lemma 1 it is straightforward to prove invertibility of A L and A R .

Proposition 4 The stability conditions of Proposition 3 imply that the matrices A L and A R are invertible.
Proof We prove invertibility of A L only-the proof for A R is the same. From the discussion above we know that the temporal termD t ⊗ I x ⊗ I y and the spatial term I t ⊗D x ⊗ I y + I t ⊗ I x ⊗D y of A L commute, and that the eigenvalues of the temporal term have strictly positive real parts. Furthermore, from Lemma 1, the eigenvalues of the spatial part has non-negative real parts. It follows that the eigenvalues of A L -being sums of the eigenvalues of the temporal and spatial parts-have strictly positive real parts (see [4, p. 117]). Then, since all its eigenvalues are nonzero, A L is invertible.
It follows in a similar manner that the full system (17) has a unique solution.

Proposition 5 The stability conditions of Proposition 3 imply that the system (17) has a unique solution.
Proof Again we split the matrix into a temporal and a spatial part: The matrix S is the blockwise Kronecker product of I t and the matrix in (22). Since (22) is stable, it follows that the eigenvalues of S have non-negative real parts [5, p. 178]. Furthermore, by Conjecture 1, the eigenvalues of T have strictly positive real parts. But then, since T and S commute (this follows from the Kronecker product structure), the eigenvalues of the sum T + S have strictly positive real parts. It follows that the system (17) has a unique solution.
Finally, the invertibility of the interface system is an immediate consequence of Propositions 4 and 5.

Proposition 6
The stability conditions of Proposition 3 imply that the interface system described in Sect. 6 has a unique solution.
Proof The equations of the interface system is a subset of the equations of the full system (18). By Propositions 4 and 5, the system (18) has a unique solution. Hence, the interface system has a unique solution.

N-Domain Discretization
The procedure for two domains discussed above can be straightforwardly extended to an arbitrary number of subdomains, leading to systems defined by matrices of the form ⎡ where i j is nonzero if and only if the subdomains i and j share an interface. An interface system is then constructed as in the two-domain case by solving linear systems of subdomain size.

Numerical Experiments
Our domain decomposition scheme is readily extended to curvilinear blocks. To verify that our code produces solutions that converge with design order of accuracy we consider (1) with a = (1, 1) and = 0.01, posed on the domain Ω shown in Fig. 5. Data is given by the manufactured solution u = cos(− 2.5π x + 2.1π y + t) .
We compute approximate solutions at time T = 1 for increasingly fine grids and for different order SBP operators in space. The equation is integrated in time using an SBP(2, 1) operator with 3 points per temporal block and Δt small enough to not influence the spatial error. Convergence rates and L 2 errors in space are shown in Table 1, verifying that the schemes converge with the correct order.
We investigate the efficiency of our method by comparing against both explicit Runge-Kutta time integration and the single-domain (SD) formulation described in Sect. 4. The square domain Ω = (0, 1) 2 is partitioned into 9 blocks as in Fig. 6, and we again set a = (1, 1), = 0.01, and use data from (24). The multidomain formulation described in Sect. 5 is easily made semi-discrete by only discretizing in space, leaving the time dimension  The format SBP(2 p, p) means that the operator is of order 2 p for interior nodes and p for boundary nodes, which yields a global order of p + 1 [12]. The N -column shows the resolution (N × N ) of the subdomain grids continuous. This yields a system of ODEs which we solve using the Matlab routine ode45a Runge-Kutta based explicit integrator with adaptive step size. Both our implicit methods Using the 9 subdomain grid structure in Fig. 6 gives a total spatial resolution of (3N −2)× (3N − 2) (because the interface nodes are shared). Hence, for the single-domain algorithm: -Discretize the domain Ω = (0, 1) 2 by a (3N − 2) × (3N − 2) grid.
-Compute the solution using implicit integration as described in Sect. 4.
Record the execution time and spatial error at the final time.
The execution times are plotted against the spatial errors at time T = 1 in Fig. 7. Our implicit domain decomposition based integrator is an order of magnitude faster than both the explicit integrator and the single-domain implicit integrator (and several orders of magnitude faster for finer grids). As the grid resolution increases, the implicit single-domain algorithm becomes inefficient due to the size of the system that must be solved in each time step. This makes the single-domain algorithm memory intensive, and for fine grids the method breaks down due to memory limitations (this is why the single-domain algorithm has fewer measurement points in Fig. 7; for high resolutions we simply run out of memory). The explicit solver, while not very memory intensive, requires very small time steps for stability, making it computationally expensive.

Remark 6
The above comparison is between Matlab implementations running on a single desktop machine. The bulk of the computations consist of matrix-vector addition and multiplication, and solving linear systems. These are all operations which are heavily parallelized in Matlab. Large scale parallelization using computer clusters will be done in a future paper. Execution times for the domain decomposition scheme can be optimized by appropriately choosing the number of subdomains. The size of the subproblems increase with grid refinement, ultimately resulting in systems that cannot be solved efficiently. By increasing the number of subdomains, we reduce the sizes of the subsystems, but increase the size of the interface system. Hence there is an optimal number of subdomains for producing solutions at a given accuracy in the least amount of time. To illustrate this effect we compute solutions on the unit square, with increasing number of subdomains. More precisely, a reference accuracy is set by computing a solution at time T = 1 using the single-domain algorithm [with Δt = 0.05 and data from (24)] on a 48 × 48 grid. Next, solutions are computed using M 2 subdomains, where M = 1, 2, 3, 4, 5. The execution time needed to achieve equal or higher accuracy than the reference accuracy is recorded. This is typically achieved by using grid size ≈ (48/M) × (48/M) for each subdomain. In this case the optimal execution time is reached at 3 2 blocks. The results are shown in Fig. 8.
A basic heuristic for determining the optimal number of subdomains can be constructed by balancing the sizes of the interface and subdomain problems. This is done by deriving expressions for the number of interface and subdomain nodes in terms of the total resolution and the number of subdomains. In our unit square example, if the total resolution is N × N and we use M 2 subdomains, then each subdomain comprises (N /M) 2 nodes, while the interfaces comprise 2(M − 1)N nodes in total. By letting these quantities be equal we can solve for M: In general, the solution will not be an integer, so the best we can do is to round off to the nearest integer. For N = 48, the solution to Eq. (25) is M ≈ 3.26, which indicates that 3 2 = 9 subdomains should be optimal, which is what we find in our computations as well.  Fig. 9 Temporal resolution and execution times needed to achieve optimal spatial accuracy at time T = 1, using the grid shown in Fig. 6 and data from (26)

Fig. 10 A multiblock grid
For problems with high temporal variation and low spatial variation we can benefit from raising the order of accuracy in time. We will illustrate this with data from the manufactured solution u = cos(− 0.5π x + 0.1π y + 2πt) .
We compute solutions at time T = 1 for increasingly fine temporal resolutions using both SBP(2, 1) and SBP(4, 2) operators in time (and SBP(4, 2) in space). Naturally the SBP(2, 1) operator requires smaller time steps than the SBP(4, 2) operator in order to reach the optimal spatial error for the chosen grid. The downside of raising the order of accuracy in time is that it increases the system size (since the stencil is larger we need more time points per implicit time step). However in this case the harsher time step requirements of the SBP(2, 1) operator incurs higher computational cost than the increased system size of the SBP(4, 2) operator, resulting in slower execution times. The results can be seen in Fig. 9.
Finally we illustrate our procedure by an example. We compute the solution with homogeneous boundary and forcing data, Gaussian initial data, = 0.01, a = (0, 0.3), Δx = 1/24, Δt = 1/20, T = 10. The simulation uses the grid shown in Fig. 10 with 4th order operators in space and a 2nd order operator in time. Color plots of the solution at different times are shown in Fig. 11. The solution slowly advects to the right and diffuses, without oscillations.

Fig. 11
A solution with homogeneous boundary and forcing data, and Gaussian initial data

Conclusions
We have formulated an efficient, fully discrete and provably stable domain decomposition scheme for the advection-diffusion equation using SBP-SAT in space and time. The singledomain problem is reduced to a set of subdomain-sized problems along with a linear system comprising the interface components. Using the stability of the scheme together with the spectral properties of the SBP operators we proved that the scheme is invertible, i.e. for any given set of data, the scheme will produce a unique convergent solution.
By numerical experiments we showed significant efficiency gain compared to both implicit single-domain and explicit multi-block solvers. Due to the stiffness of the equation, explicit time integration is crippled by the small time steps required for stability. The implicit singledomain solver does not require small time steps, but becomes inefficient with grid refinement as the size of the linear system grows.
The ideas presented here provide both theoretical and practical paths for further research into the connection between SBP-SAT based discretizations and domain decomposition. In future papers we intend to elaborate on the method used on non-trivial domains using curvilinear grids. The viability of the method should also be further explored through parallelized implementations and research related to applications using variable coefficient as well as non-linear problems.
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Appendix A
The proof of Proposition 3 is similar to the proof of Theorem 3.1 in [1]. The idea is to apply the energy method to (15)-i.e., multiply the top equation by 2u P, and the bottom equation by 2v P-and choose σ I L , σ V L , σ I R , σ V R such that the interface contribution to the energy of the solution becomes non-positive. Recall from Sect. 4 the terms (8)- (11). All of these terms will appear for both u and v, but in this section we are concerned only with the interface terms (and the dissipative terms 2 D x u 2 P and 2 D x v 2 P as we will see shortly). Omitting the non-interface terms, we get u 2 We would like to rewrite the above expression as a quadratic form in u, v, D x u and D x v. In order to do this we will assume that P e = E we P w E ew (i.e. the interface quadrature in the left domain is the same as the interface quadrature in the right domain) so that the inner products above can be replaced by a common inner product on the interface ·, · P I . Note also that we have the following inequalities for the dissipative terms 2 D x u 2 P and 2 D x v 2 P : This is best understood by rewriting the numerical integral · 2 P using triple indices. For example, if w = D x u, then 2 w 2 P = 2 i jk α i β j γ k w 2 i jk , where w i jk ≈ w(t i , x j , y k ) and α i = (P t ) ii , β j = (P x ) j j , γ k = (P y ) kk . The right-hand side above is then simply the sum we get when we fix j = N x .
With this in mind we can prove Proposition 3.
Proof Using (28) and (27) we get where w = (u v (D x u) (D x v) ) and The matrix B is the same as in the proof of Theorem 3.1 in [1], where it is shown to be negative semi-definite.