1 Introduction

Consider the time-dependent partial differential equation (1a) below, where \({\mathcal {L}}\) represents a linear differential operator and \(f(x)\) is a forcing function. We assume that some suitable initial condition and—for the moment homogeneous—boundary conditions are given such that we have a well-posed problem. Applying the method of lines, that is discretizing first in space while keeping time continuous, yields a system of ordinary differential equations (1b), where we refer to \(L\) as the discretizarion matrix.

$$\begin{aligned} u_t+{\mathcal {L}}u&=f,&t\ge 0,\quad x\in [0,\ell ], \end{aligned}$$
(1a)
$$\begin{aligned} {\mathbf {v}}_t+L{\mathbf {v}}&=\mathbf {f},&t\ge 0. \end{aligned}$$
(1b)

We first look at the scalar advection equation and thereafter at the heat equation, both in one spatial dimension. Thus \(L\) approximates either the first or the second derivative operator, including boundary treatments.

In this paper, \(L\) is obtained using the SBP-SAT finite difference method. This class of finite difference method is based on difference operators fulfilling summation-by-parts (SBP) properties, and is modified by the penalty technique simultaneous approximation term (SAT) for treating the boundary conditions. The SBP operators were first developed for first derivatives [21, 29] and then later for second derivatives [7, 25] and are designed to facilitate the derivation of energy estimates. A means to impose boundary conditions without destroying these properties is to use SAT [6]. The SATs included in \(L\) contain free parameters. We follow the common practice of determining these parameters using the energy method, such that (1b) is guaranteed to be time-stable. Thereafter, any remaining degrees of freedom in the SATs can be used to make the scheme dual consistent. Dual consistency is advantageous when computing functionals of the solution, since the order of accuracy of functionals from dual consistent schemes can be higher compared to those from non-dual consistent schemes [18]. For more details about SBP-SAT, see [12, 31].

Thanks to the SBP-SAT properties, the discretization matrix can be factorized as \(L=H^{-1}K\), where \(H\) is a symmetric, positive definite matrix that has the role of a quadrature rule, see [19]. Now consider the steady version of (1a), \({\mathcal {L}}u=f\). Its solution \( u(x)\) may be represented as in (2a) below, where \({\mathcal {G}}\) is the Green’s function. The steady version of (1b) is \(L{\mathbf {v}}=\mathbf {f}\). Solving for \({\mathbf {v}}\), yields (2b).

$$\begin{aligned} u(x)&=\int _0^{ \ell }{\mathcal {G}}(x,y)f(y)\,\mathrm {d} y, \end{aligned}$$
(2a)
$$\begin{aligned} {\mathbf {v}}&=K^{-1}H\mathbf {f}. \end{aligned}$$
(2b)

With \(H\)’s role as a quadrature rule in mind, we can see a clear similarity between (2a) and (2b): Since \(\mathbf {f}\) approximates \(f\) and the multiplication by \(H\) approximates the integration, we realize that \(K^{-1}\) resembles the Green’s function \({\mathcal {G}}\). It makes sense to refer to \(K^{-1}\) as a discrete Green’s function.

A finite difference analogue of the Green’s function was introduced already in the fundamental article [9]. Thereafter, discrete Green’s functions appear sporadically in the literature, see for example [8, 10] and references therein. E.g. in [4] (and correspondingly in [9] for two-dimensional problems) the finite formula approximating (2a) is scaled with the spatial mesh size h, which then corresponds closely to (2b). However, since traditional finite difference stencils usually do not have an assigned quadrature rule in the same sense as the SBP operators, the term “discrete Green’s functions” often refers to \(L^{-1}\) rather than to \(K^{-1}\), for example in [5, 8, 28].

In the above-mentioned articles, the standard way of enforcing boundary conditions, injection, has been used instead of SAT (for descriptions of these two boundary methods, see for example [31]). In [14], the first and second derivatives were approximated using an SBP-SAT finite volume method, the inverses analogous to \(K^{-1}\) were derived and used for analysing errors. Here, we derive formulas for \(K^{-1}\) corresponding to the first and second derivatives as well, however, as an extension to the results in [14], our formulas hold for arbitrary orders of accuracy and in the second derivative case we consider general Robin boundary conditions instead of only Dirichlet boundary conditions.

The inverses are full matrices and are therefore probably not competitive for solving systems \(L{\mathbf {v}}=\mathbf {f}\) directly, compared to fast solvers for banded matrices. It is however often advisable to use pre-conditioning to improve the convergence of iterative methods [16]. A preconditioning matrix \(P\) should ideally approximate the inverse of \(L\) in some sense, and knowledge about the structure of the inverses could—speculatively—be used when designing preconditioning matrices. If \(P\) is a sparse approximate inverse, the computations are cheap, but preconditioners \(P\) may also be essentially dense matrices, as for example the fundamental solution preconditioners considered in [5].

The paper is organized as follows: in Sect. 2, we look at the semi-discrete scheme approximating the advection equation. The matrix \(K\) associated with \(\frac{\partial }{\partial x}\) is denoted \({\widetilde{Q}}\), and its inverse is presented in Theorem 2.1. In Sect. 3, we consider the heat equation, thus approximating \(\frac{\partial ^2}{\partial x^2}\). The related matrix \(K\), denoted \({\widetilde{A}}\), is inverted in Theorem 3.1. The SAT parameters are chosen to give stability and dual consistency, and additionally it is of interest to know if some choices of SAT parameters result in a singular discretization matrix \(L\). In the second derivative case, it turns out that an energy stable scheme can actually have a singular \(L\) if the scheme is also dual consistent. Some relations between stability, dual consistency and a singular discretization matrix are discussed in Sect. 3.3. We also discuss the relations between two different ways of showing energy stability, in Sect. 3.4. The paper is summarized in Sect. 4.

2 The First Derivative

Consider the scalar advection equation with a Dirichlet boundary condition at the inflow boundary, that is

(3)

valid for \(t\ge 0\), with initial condition \(u(x,0)=u_0(x)\). The forcing function \(f(x,t)\), the initial data \(u_0(x)\) and the boundary data \(g_\mathrm{L}(t)\) are known functions.

We call (3) well-posed if it has a unique solution and is stable (can be bounded by data). Techniques for showing existence and uniqueness can be found in for example [17, 20]. We focus on showing stability, since we will derive a corresponding stable discrete problem later. We use the energy method, and multiply the partial differential equation in (3) by \(u\), and integrate over the spatial domain. Thereafter, we use integration by parts and apply the boundary condition. For simplicity, we consider the homogeneous case, that is with the data \(f=0\) and \(g_\mathrm{L}=0\). This yields

$$\begin{aligned} \frac{\mathrm {d} }{\mathrm {d} t}\Vert u\Vert ^2 =-u(\ell ,t)^2 \end{aligned}$$

where \(\Vert u\Vert ^2=\int _0^{\ell }u^2\,\mathrm {d} x\) and where we have used that \((u^2)_t=2uu_t\). In the homogeneous case, the growth rate thus becomes \(\frac{\mathrm {d} }{\mathrm {d} t}\Vert u\Vert ^2\le 0\). Integrating this in time yields the energy estimate \(\Vert u\Vert ^2\le \Vert u_0\Vert ^2\) and the solution is thus bounded. Since (3) is an one-dimensional hyperbolic problem it is also possible to show strong well-posedness, i.e., that \(\Vert u\Vert \) is bounded by the data \(f\), \(g_\mathrm{L}\) and \(u_0\). See [17, 20] for different definitions of well-posedness.

2.1 The Semi-discrete Scheme

We first discretize in space, on the interval \(x\in [0, \ell ]\), using \(n+1\) equidistant grid points \(x_i=ih\), where \(h= \ell /n\) and \(i=0,1,\ldots ,n\). Using the SBP-SAT finite difference method, we obtain a semi-discrete scheme approximating (3) as

$$\begin{aligned} \begin{aligned} {\mathbf {v}}_t+D_1{\mathbf {v}}=\mathbf {f}&+H^{-1} \sigma _\mathrm{L}{\mathbf {e}}_\mathrm{L}\left( {\mathbf {e}}_\mathrm{L}^{\mathsf {T}}{\mathbf {v}}-g_\mathrm{L}\right) , \end{aligned} \end{aligned}$$
(4)

where \({\mathbf {v}}(t)=[v_0, v_1, \ldots , v_n]^{\mathsf {T}}\) is the approximation of the continuous solution \(u(x,t)\), and where \(\mathbf {f}=[f(x_0,t), f(x_1,t), \ldots , f(x_n,t)]^{\mathsf {T}}\) is the restriction of \(f(x,t) \) to the grid. In the same way, we let the initial data be \({\mathbf {v}}(0)=[u_0(x_0), u_0(x_1), \ldots , u_0(x_n)]^{\mathsf {T}}\). The matrix \(D_1\) approximates the first derivative operator \(\partial /\partial x\), and fulfills the SBP-properties [21, 29]

$$\begin{aligned} D_1=H^{-1}Q,&H=H^{\mathsf {T}}>0,&Q+Q^{\mathsf {T}}={\mathbf {e}}_\mathrm{R}{\mathbf {e}}_\mathrm{R}^{\mathsf {T}}-{\mathbf {e}}_\mathrm{L}{\mathbf {e}}_\mathrm{L}^{\mathsf {T}}\end{aligned}$$
(5)

where \({\mathbf {e}}_\mathrm{L}=[1, 0, \ldots , 0]^{\mathsf {T}}\) and \({\mathbf {e}}_\mathrm{R}=[ 0, \ldots , 0, 1]^{\mathsf {T}}\). By the notation >, we mean that the matrix \(H\) is positive definite. As mentioned in the introduction, \(H\) has the role of a quadrature rule and \(\Vert {\mathbf {v}}\Vert _{H}^2\equiv {\mathbf {v}}^{\mathsf {T}}H{\mathbf {v}}\) approximates the \(L^2\)-norm of \(u(x,t)\), see [19]. The scalar \(\sigma _\mathrm{L}\) determines the strength of the SAT, and will be chosen below such that the scheme (4) is energy stable and dual consistent.

2.1.1 Stability and Dual Consistency

To show energy stability, we multiply (4) by \({\mathbf {v}}^{\mathsf {T}}H\) from the left and use the relations (5). We thereafter add the transpose, and we consider \(\mathbf {f}={\mathbf {0}}\) and \(g_\mathrm{L}=0\), just as in the continuous case. This yields

$$\begin{aligned} \frac{\mathrm {d} }{\mathrm {d} t}\Vert {\mathbf {v}}\Vert ^2_H=-v_n^2+(1+2 \sigma _\mathrm{L}) v_0^2, \end{aligned}$$

where \(v_0={\mathbf {e}}_\mathrm{L}^{\mathsf {T}}{\mathbf {v}}\) and \(v_n={\mathbf {e}}_\mathrm{R}^{\mathsf {T}}{\mathbf {v}}\). We need \(\frac{\mathrm {d} }{\mathrm {d} t}\Vert {\mathbf {v}}\Vert ^2_H\le 0\), which is guaranteed if \( \sigma _\mathrm{L}\le -1/2\). For a dual consistent scheme, we need \(\sigma _\mathrm{L}=-1\), see [3, 18].

2.2 The Inverse of the Discretization Matrix

We first rewrite (4) as

$$\begin{aligned} {\mathbf {v}}_t+ H^{-1}{\widetilde{Q}}{\mathbf {v}}&=\widetilde{\mathbf{f}}, \end{aligned}$$
(6)

where

$$\begin{aligned} {\widetilde{Q}}=Q- \sigma _\mathrm{L}{\mathbf {e}}_\mathrm{L}{\mathbf {e}}_\mathrm{L}^{\mathsf {T}},&\widetilde{\mathbf{f}}=\mathbf {f}-H^{-1} \sigma _\mathrm{L}{\mathbf {e}}_\mathrm{L}g_\mathrm{L}. \end{aligned}$$
(7)

We identify \({\widetilde{Q}}\) as the first derivative version of \(K\) discussed in the introduction. The second order accurate version of \({\widetilde{Q}}\) was inverted in [14] and inspired by those results, we make a similar ansatz and derive \({\widetilde{Q}}^{-1}\) of arbitrary order of accuracy. The result is given in Theorem 2.1.

Theorem 2.1

Consider the \((n+1) \times (n+1) \)-matrices Q from (5) and \({\widetilde{Q}}\) found in (7). The structures of Q and \({\widetilde{Q}}\) are

(8)

where \(\vec {q}\) is an \(n\times 1\)-vector and is an \(n\times n\)-matrix. It is assumed that is invertible. The inverse of \({\widetilde{Q}}\) is

$$\begin{aligned} {\widetilde{Q}}^{-1}=G_1-\frac{1}{\sigma _\mathrm{L}}{\mathbf {1}}{\mathbf {b}}^{{\mathsf {T}}}, \end{aligned}$$
(9)

where

(10)

Proof of Theorem 2.1

We aim to show that \({\widetilde{Q}}{\widetilde{Q}}^{-1}=I\), where I is the \((n+1)\times (n+1)\) identity matrix. Using \({\widetilde{Q}}\) from (7) and \({\widetilde{Q}}^{-1}\) from (9), we compute

Note that \(D_1{\mathbf {1}}=0\), since \(D_1\) in (5) is a consistent difference operator. Hence, \(Q{\mathbf {1}}={\mathbf {0}}\). Furthermore, \({\mathbf {e}}_\mathrm{L}^{\mathsf {T}}G_1={\mathbf {0}}^{\mathsf {T}}\) since the first row of \(G_1\) consists of zeros. These relations, the fact that \({\mathbf {e}}_\mathrm{L}^{\mathsf {T}}{\mathbf {1}}=1\) and the structures of the components in (8) and (10) yields

where is the \(n\times n\) identity matrix. \(\square \)

Corollary 2.2

The structure of \({\widetilde{Q}}^{-1}\) in (9) implies that \({\widetilde{Q}}\) is singular only if \(\sigma _\mathrm{L}=0\).

The existence of \(G_1\) and \({\mathbf {b}}\) in (10), and consequently the validity of Theorem 2.1 and Corollary 2.2, rely on the assumption that is invertible. In the (2,1) order accurate case—where we by the notation “(2,1) order accurate”, refer to a matrix \(D_1\) which has second order of accuracy in the interior finite difference stencil and first order of accuracy at the boundaries—the inverse of is derived and presented in “Appendix A.1”, which directly proves its existence. The same is done for the inverse of the “Section A.2” of Appendix order accurate operator, which is presented in “Section A.2” of Appendix. Higher order operators, on the other hand, have free parameters. For example, for the diagonal norm (6,3) order accurate version of \(D_1\) described in [29], \(x_1\) is a free parameter. In this case, \({\widetilde{Q}}\) is invertible for commonly used parameter values \(x_1\), see [27]. The invertibility of \({\widetilde{Q}}\) is also addressed for general SBP operators in [22], where it is shown that \({\widetilde{Q}}\) (with \(\sigma _\mathrm{L}=-1\)) is invertible if and only if \({\mathbf {1}}\) spans the nullspace of \(D_1\).

The discussion above is focused on “classical FD-SBP operators”, constructed around centred finite difference approximations with diagonal matrices \(H\). However, Theorem 2.1 only requires consistency (such that \(Q{\mathbf {1}}=0\)) and that the SAT makes \({\widetilde{Q}}=Q- \sigma _\mathrm{L}{\mathbf {e}}_\mathrm{L}{\mathbf {e}}_\mathrm{L}^{\mathsf {T}}\). Thus it holds for a more general class of SBP operators where the boundary nodes are included in the operator, compare Definition 1 in [11]—as long as the corresponding is invertible. Moreover, in Theorem 2.1 it is implied that \(Q+Q^{\mathsf {T}}={\mathbf {e}}_\mathrm{R}{\mathbf {e}}_\mathrm{R}^{\mathsf {T}}-{\mathbf {e}}_\mathrm{L}{\mathbf {e}}_\mathrm{L}^{\mathsf {T}}\), but this is not crucial for the proof and the result applies also for e.g. upwind operators.

Remark 2.3

For the steady version of (3), that is \(u_{x}=f\) with \(u(0)=g_\mathrm{L}\), we have

$$\begin{aligned} u(x)&=g_\mathrm{L}+\int _0^{ \ell }{\mathcal {G}}(x,y)f(y)\,\mathrm {d} y,&{\mathcal {G}}(x,y)=\left\{ \begin{array}{ll}1,&{}\quad y<x,\\ 0,&{}\quad x\le y,\end{array}\right. \end{aligned}$$

where \({\mathcal {G}}\) is a Green’s function. Starting from \({\mathbf {v}}={\widetilde{Q}}^{-1}H\widetilde{\mathbf{f}}\), using (7) and (9) as well as the relations \({\mathbf {b}}^{\mathsf {T}}{\mathbf {e}}_\mathrm{L}=1\) and \(G_1{\mathbf {e}}_\mathrm{L}={\mathbf {0}}\) deduced from (10), we obtain

$$\begin{aligned} {\mathbf {v}}&=g_\mathrm{L}{\mathbf {1}}+{\widetilde{Q}}^{-1}H\mathbf {f}. \end{aligned}$$

Recall from the introduction that \(K^{-1}={\widetilde{Q}}^{-1}\) resembles \({\mathcal {G}}\). E.g. the version of \({\widetilde{Q}}^{-1}\) found in (34) in “Section A.1” of Appendix (which corresponds to the second order accurate operator) is

$$\begin{aligned} \left( {\widetilde{Q}}^{-1}\right) _{i,j}=\left\{ \begin{array}{ll} 1-(1+1/\sigma _\mathrm{L})(-1)^j,&{}\quad 0\le j\le i\le n,\\ (-1)^{i+j}-(1+1/\sigma _\mathrm{L})(-1)^j,&{}\quad 0\le i \le j\le n. \end{array}\right. \end{aligned}$$

The dual consistent choice \(\sigma _\mathrm{L}=-1\) is optimal in the sense that it cancels the oscillations such that \(({\widetilde{Q}}^{-1})_{i,j}=1\) for \(j\le i\), however \(({\widetilde{Q}}^{-1})_{i,j}=(-1)^{i+j}\ne 0\) for \(i\le j\). If we instead let \(\sigma _\mathrm{L}\rightarrow -\infty \), interpreted as mimicking the injection treatment, results in \({\widetilde{Q}}^{-1}=G_1\). By writing the numerical solution as \({\mathbf {v}}={\mathbf {1}}\left( g_\mathrm{L}-\frac{1}{\sigma _\mathrm{L}}{\mathbf {b}}^{\mathsf {T}}H\mathbf {f}\right) +G_1H\mathbf {f}\), we see that the constant level of the solution varies when \(\sigma _\mathrm{L}\) is tuned. In particular, \({\mathbf {e}}_\mathrm{L}^{\mathsf {T}}{\mathbf {v}}\rightarrow g_\mathrm{L}\) as \(\sigma _\mathrm{L}\rightarrow -\infty \).

2.2.1 Interface SATs

The SBP-SAT methodology is well suited for dividing the computational domain into subdomains, coupled by interfaces [7]. As an example, we discretize (3) again, using two subdomains with the unknowns \({\mathbf {v}}_{\mathrm{A},\mathrm{B}}\), coupled such that \({\mathbf {e}}_\mathrm{R}^{\mathsf {T}}{\mathbf {v}}_\mathrm{A}\approx {\mathbf {e}}_\mathrm{L}^{\mathsf {T}}{\mathbf {v}}_\mathrm{B}\) at the interface. Modifying (4) to this two-subdomain system yields

$$\begin{aligned} \frac{\mathrm {d} }{\mathrm {d} t}V+\mathbb {H}^{-1}\widetilde{{\mathbb {Q}}}V={\widetilde{F}}, \end{aligned}$$

with

where all quantities with subindex A belongs to the left subdomain and the ones marked with B to the right subdomain. The same vectors \({\mathbf {e}}_\mathrm{L,R}\) are used in both domains implying that they have the same number of grid points, but that is merely for ease of presentation. In particular, \({\widetilde{Q}}_\mathrm{A}=Q_\mathrm{A}- \sigma _\mathrm{L}{\mathbf {e}}_\mathrm{L}{\mathbf {e}}_\mathrm{L}^{\mathsf {T}}\) and \(\widetilde{\mathbf{f}}_\mathrm{A}=\mathbf {f}_\mathrm{A}-H_\mathrm{A}^{-1} \sigma _\mathrm{L}{\mathbf {e}}_\mathrm{L}g_\mathrm{L}\) are modified to impose the boundary condition, and \(\mu _{\mathrm{A},\mathrm{B}}\) are the penalty parameters at the interface. For \(\mu _{\mathrm{A}}-\mu _{\mathrm{B}}=1\) with \(\mu _{\mathrm{A}}+\mu _{\mathrm{B}}\le 0\), the scheme is conservative, dual consistent and stable.

Assume \(Q_{\mathrm{A},\mathrm{B}}{\mathbf {1}}={\mathbf {0}}\), and let \({\widetilde{Q}}_\mathrm{A}^{-1}=G_\mathrm{A}-\frac{1}{\sigma _\mathrm{L}}{\mathbf {1}}{\mathbf {b}}_\mathrm{A}^{\mathsf {T}}\), and \((Q_\mathrm{B}-\mu _\mathrm{B}{\mathbf {e}}_\mathrm{L}{\mathbf {e}}_\mathrm{L}^{\mathsf {T}})^{-1}=G_\mathrm{B}-\frac{1}{\mu _\mathrm{B}}{\mathbf {1}}{\mathbf {b}}_\mathrm{B}^{\mathsf {T}}\). Then \({\mathbb {Q}}\mathbb {1}=0\), where \({\mathbb {Q}}=\widetilde{{\mathbb {Q}}}(\sigma _\mathrm{L}=0)\) and where \(\mathbb {1}\) is given below. In this case Theorem 2.1 applies and the inverse of \(\widetilde{{\mathbb {Q}}}\) has the form \(\widetilde{{\mathbb {Q}}}^{-1}={\mathbb {G}}-\frac{1}{\sigma _\mathrm{L}}\mathbb {1}\mathbb {b}^{\mathsf {T}}\), where

are obtained using the formula for inverse of block-matrices together with the relations \({\mathbf {e}}_\mathrm{L}^{\mathsf {T}}G_\mathrm{B}={\mathbf {0}}^{\mathsf {T}}\), \(G_\mathrm{B}{\mathbf {e}}_\mathrm{L}={\mathbf {0}}\), \({\mathbf {b}}_\mathrm{B}^{\mathsf {T}}{\mathbf {e}}_\mathrm{L}=1\) and \({\mathbf {e}}_\mathrm{L,R}^{\mathsf {T}}{\mathbf {1}}=1\).

As in the single domain case, \(\widetilde{{\mathbb {Q}}}^{-1}\) can be interpreted as a discrete Green’s function. In particular, we note an interesting behaviour when \(\mu _{\mathrm{A}}=0\) and \(\mu _{\mathrm{B}}=-1\), i.e. a fully up-wind coupling. Then \(\widetilde{{\mathbb {Q}}}\) is block-triangular, which leads to

We see that with up-wind the continuous feature of having \({\mathcal {G}}(x,y)=0\) for \(x\le y\) from Remark 2.3 is at least mimicked on block-matrix level.

3 The Second Derivative

Now consider the scalar heat equation with Robin boundary conditions, that is

(11)

valid for \(t\ge 0\), with initial condition \(u(x,0)=u_0(x)\). The forcing function \(f(x,t)\), the initial data \(u_0(x)\) and the boundary data \(g_\mathrm{L,R}(t)\) are known functions.

We multiply the partial differential equation in (11) by \(u\) and integrate the result over the spatial domain, with the data put to \(f=0\) and \(g_\mathrm{L,R}=0\). Thereafter using integration by parts and the boundary conditions, yields

$$\begin{aligned} \frac{\mathrm {d} }{\mathrm {d} t}\Vert u\Vert ^2+2\Vert u_x\Vert ^2=-2 \frac{\beta _\mathrm{R}}{\alpha _\mathrm{R}}u_{x}( \ell ,t)^2-2 \frac{\beta _\mathrm{L}}{\alpha _\mathrm{L}}u_{x}(0,t)^2. \end{aligned}$$

For a decaying growth rate, we need \(\alpha _\mathrm{L,R}\beta _\mathrm{L,R}\ge 0\).

3.1 The Semi-discrete Scheme

Using the SBP-SAT finite difference method, we obtain a scheme approximating (11) as

$$\begin{aligned} \begin{aligned} {\mathbf {v}}_t-D_2{\mathbf {v}}=\mathbf {f}&+H^{-1}( \sigma _\mathrm{L}{\mathbf {e}}_\mathrm{L}-\tau _\mathrm{L}{\mathbf {d}}_\mathrm{L}) \left( \alpha _\mathrm{L}{\mathbf {e}}_\mathrm{L}^{\mathsf {T}}{\mathbf {v}}-\beta _\mathrm{L}{\mathbf {d}}_\mathrm{L}^{\mathsf {T}}{\mathbf {v}}-g_\mathrm{L}\right) \\ {}&+H^{-1}(\sigma _\mathrm{R}{\mathbf {e}}_\mathrm{R}+\tau _\mathrm{R}{\mathbf {d}}_\mathrm{R}) \left( \alpha _\mathrm{R}{\mathbf {e}}_\mathrm{R}^{\mathsf {T}}{\mathbf {v}}+\beta _\mathrm{R}{\mathbf {d}}_\mathrm{R}^{\mathsf {T}}{\mathbf {v}}-g_\mathrm{R}\right) , \end{aligned} \end{aligned}$$
(12)

where \({\mathbf {v}}\), \(\mathbf {f}\), \(H\) and \({\mathbf {e}}_\mathrm{L,R}\) are described as in Sect. 2.1. The matrix \(D_2\) approximates the second derivative operator, and fulfills the SBP-properties

(13)

The vectors \({\mathbf {d}}_\mathrm{L}\) and \({\mathbf {d}}_\mathrm{R}\) are consistent finite difference stencils approximating the first derivative, see [7]. Two common categories of \(D_2\) operators are wide-stencil and narrow-stencil operators. Wide-stencil operators can be factorized as \(D_2= D_1^2\), and the term “narrow” describes finite difference schemes with a minimal stencil width [26].

The penalty parameters \(\sigma _\mathrm{L,R}\) and \(\tau _\mathrm{L,R}\) in (12) are scalars that will be further specified and discussed in the next sections. Now, we use (13) to rewrite (12) as

$$\begin{aligned} {\mathbf {v}}_t+ H^{-1}{\widetilde{A}}{\mathbf {v}}&=\widetilde{\mathbf{f}}, \end{aligned}$$
(14)

where

(15)

and where \(\widetilde{\mathbf{f}}=\mathbf {f}-H^{-1}( \sigma _\mathrm{L}{\mathbf {e}}_\mathrm{L}-\tau _\mathrm{L}{\mathbf {d}}_\mathrm{L})g_\mathrm{L}-H^{-1}(\sigma _\mathrm{R}{\mathbf {e}}_\mathrm{R}+\tau _\mathrm{R}{\mathbf {d}}_\mathrm{R}) g_\mathrm{R}\). We identify \({\widetilde{A}}\) as the second derivative version of the matrix \(K\) from the introduction.

3.1.1 Stability

To show energy stability, we multiply (12) by \({\mathbf {v}}^{\mathsf {T}}H\) from the left and use the relations (13). We thereafter add the transpose, and let \(\mathbf {f}={\mathbf {0}}\) and \(g_\mathrm{L,R}=0\). This yields

$$\begin{aligned} \begin{aligned} \frac{\mathrm {d} }{\mathrm {d} t}\Vert {\mathbf {v}}\Vert ^2_H+2{\mathbf {v}}^{\mathsf {T}}A{\mathbf {v}}&=2{\mathbf {v}}^{\mathsf {T}}({\mathbf {e}}_\mathrm{R}{\mathbf {d}}_\mathrm{R}^{\mathsf {T}}-{\mathbf {e}}_\mathrm{L}{\mathbf {d}}_\mathrm{L}^{\mathsf {T}}){\mathbf {v}}\\&\quad +\,2{\mathbf {v}}^{\mathsf {T}}( \sigma _\mathrm{L}{\mathbf {e}}_\mathrm{L}-\tau _\mathrm{L}{\mathbf {d}}_\mathrm{L}) \left( \alpha _\mathrm{L}{\mathbf {e}}_\mathrm{L}^{\mathsf {T}}{\mathbf {v}}-\beta _\mathrm{L}{\mathbf {d}}_\mathrm{L}^{\mathsf {T}}{\mathbf {v}}\right) \\&\quad +\,2{\mathbf {v}}^{\mathsf {T}}(\sigma _\mathrm{R}{\mathbf {e}}_\mathrm{R}+\tau _\mathrm{R}{\mathbf {d}}_\mathrm{R})\left( \alpha _\mathrm{R}{\mathbf {e}}_\mathrm{R}^{\mathsf {T}}{\mathbf {v}}+\beta _\mathrm{R}{\mathbf {d}}_\mathrm{R}^{\mathsf {T}}{\mathbf {v}}\right) ,\end{aligned} \end{aligned}$$
(16)

where we need to show that \(\frac{\mathrm {d} }{\mathrm {d} t}\Vert {\mathbf {v}}\Vert ^2_H\le 0\). We will determine the stability limits of \(\sigma _\mathrm{L,R}\) and \(\tau _\mathrm{L,R}\) using a procedure sometimes called the borrowing technique [1, 2, 7, 15, 24, 30, 32]. The idea is to “borrow” a maximum amount \(\gamma \) of “positivity” from A, more precisely as

$$\begin{aligned} A={\tilde{A}}_\gamma +h\gamma ({\mathbf {d}}_\mathrm{L}{\mathbf {d}}_\mathrm{L}^{\mathsf {T}}+{\mathbf {d}}_\mathrm{R}{\mathbf {d}}_\mathrm{R}^{\mathsf {T}}),&{\tilde{A}}_\gamma \ge 0,\quad \gamma >0. \end{aligned}$$
(17)

Inserting the relation in (17) into (16), we obtain

$$\begin{aligned} \frac{\mathrm {d} }{\mathrm {d} t}\Vert {\mathbf {v}}\Vert ^2_H+2{\mathbf {v}}^{\mathsf {T}}{\tilde{A}}_\gamma {\mathbf {v}}&=\left[ \begin{array}{c}{\mathbf {e}}_\mathrm{L}^{\mathsf {T}}{\mathbf {v}}\\ -{\mathbf {d}}_\mathrm{L}^{\mathsf {T}}{\mathbf {v}}\end{array}\right] ^{\mathsf {T}}\left[ \begin{array}{cc} 2\sigma _\mathrm{L}\alpha _\mathrm{L}&{}1+\sigma _\mathrm{L}\beta _\mathrm{L}+\tau _\mathrm{L}\alpha _\mathrm{L}\\ 1+\sigma _\mathrm{L}\beta _\mathrm{L}+\tau _\mathrm{L}\alpha _\mathrm{L}&{}2\tau _\mathrm{L}\beta _\mathrm{L}-2h\gamma \end{array}\right] \left[ \begin{array}{c}{\mathbf {e}}_\mathrm{L}^{\mathsf {T}}{\mathbf {v}}\\ -{\mathbf {d}}_\mathrm{L}^{\mathsf {T}}{\mathbf {v}}\end{array}\right] \\&\quad +\,\left[ \begin{array}{c}{\mathbf {e}}_\mathrm{R}^{\mathsf {T}}{\mathbf {v}}\\ {\mathbf {d}}_\mathrm{R}^{\mathsf {T}}{\mathbf {v}}\end{array}\right] ^{\mathsf {T}}\left[ \begin{array}{cc} 2\sigma _\mathrm{R}\alpha _\mathrm{R}&{}1+\sigma _\mathrm{R}\beta _\mathrm{R}+\tau _\mathrm{R}\alpha _\mathrm{R}\\ 1+\sigma _\mathrm{R}\beta _\mathrm{R}+\tau _\mathrm{R}\alpha _\mathrm{R}&{}2\tau _\mathrm{R}\beta _\mathrm{R}-2h\gamma \end{array}\right] \left[ \begin{array}{c}{\mathbf {e}}_\mathrm{R}^{\mathsf {T}}{\mathbf {v}}\\ {\mathbf {d}}_\mathrm{R}^{\mathsf {T}}{\mathbf {v}}\end{array}\right] . \end{aligned}$$

For stability, we need both the matrices in the two quadratic forms above to be negative semi-definite. This is fulfilled if

$$\begin{aligned} \begin{aligned} 2\sigma _\mathrm{L,R}\alpha _\mathrm{L,R}&\le 0\\ 2(\tau _\mathrm{L,R}\beta _\mathrm{L,R}-h\gamma )&\le 0\\ (1+\tau _\mathrm{L,R}\alpha _\mathrm{L,R}+\sigma _\mathrm{L,R}\beta _\mathrm{L,R})^2&\le 4\sigma _\mathrm{L,R}\alpha _\mathrm{L,R}(\tau _\mathrm{L,R}\beta _\mathrm{L,R}-h\gamma ).\end{aligned} \end{aligned}$$
(18)

3.1.2 Dual Consistency

To make the scheme (12) dual consistent we first note that the operator \(\partial ^2/\partial x^2\) (including boundary conditions) is a symmetric operator and that the matrix \({\widetilde{A}}\) must be symmetric to mimic this. From (15) it is clear that \({\widetilde{A}}\) is symmetric if \(1+\sigma _\mathrm{L,R}\beta _\mathrm{L,R}=\tau _\mathrm{L,R}\alpha _\mathrm{L,R}\). Let

$$\begin{aligned} \delta _\mathrm{L}\equiv 1 +\sigma _\mathrm{L}\beta _\mathrm{L}-\tau _\mathrm{L}\alpha _\mathrm{L}&\delta _\mathrm{R}\equiv 1+ \sigma _\mathrm{R}\beta _\mathrm{R}- \tau _\mathrm{R}\alpha _\mathrm{R}, \end{aligned}$$
(19)

where \(\delta _\mathrm{L,R}=0\) for dual consistent choices of penalty parameters. The relations in (19), with \(\delta _\mathrm{L,R}=0\), can also be derived from the penalty parameters of the scalar problem in [13]. For a background and more thorough descriptions of dual consistency, see [18].

Note that now, using the dual consistency parameters \(\delta _\mathrm{L,R}\) defined in (19), the three stability requirements in (18) can be reformulated as

$$\begin{aligned} \sigma _\mathrm{L,R}\alpha _\mathrm{L,R}\le 0,&\tau _\mathrm{L,R}\beta _\mathrm{L,R}\le h\gamma ,&\delta _\mathrm{L,R}^2\le -4 \alpha _\mathrm{L,R}(\sigma _\mathrm{L,R}h\gamma +\tau _\mathrm{L,R}). \end{aligned}$$
(20)

3.2 The Inverse of the Discretization Matrix

We consider the steady version of (14), that is \(H^{-1}{\widetilde{A}}{\mathbf {v}}=\widetilde{\mathbf{f}}\), which has a unique solution \({\mathbf {v}}={\widetilde{A}}^{-1}H\widetilde{\mathbf{f}}\), if \({\widetilde{A}}^{-1}\) exists. We derive this inverse and present the result in Theorem 3.1.

Theorem 3.1

Consider \({\widetilde{A}}\) in (15), which depends on A and \({\mathbf {d}}_\mathrm{L,R}\) in (13) and on the boundary related scalars \(\sigma _\mathrm{L,R}\), \(\tau _\mathrm{L,R}\), \(\alpha _\mathrm{L,R}\) and \(\beta _\mathrm{L,R}\). Let the parts of A be denoted as follows,

$$\begin{aligned} A=\left[ \begin{array}{ccc}a_\mathrm{L}&{}\quad \vec {a}_\mathrm{L}^{{\mathsf {T}}}&{}\quad a_\mathrm{C}\\ \vec {a}_\mathrm{L}&{}\quad {\bar{A}}&{}\quad \vec {a}_\mathrm{R}\\ a_\mathrm{C}&{}\quad \vec {a}_\mathrm{R}^{{{\mathsf {T}}}}&{}\quad a_\mathrm{R}\end{array}\right] , \end{aligned}$$
(21)

where \(a_\mathrm{L}\), \(a_\mathrm{R}\) and \(a_\mathrm{C}\) are scalars, \(\vec {a}_\mathrm{L,R}\) are \((n-1)\times 1\)-vectors and \({\bar{A}}\) is an \((n-1)\times (n-1)\)-matrix. The inverse of \({\widetilde{A}}\) is

$$\begin{aligned} {\widetilde{A}}^{-1}&=G_2+\left[ \begin{array}{cccc}-\tau _\mathrm{L}{\mathbf {b}}_\mathrm{L}&-\tau _\mathrm{R}{\mathbf {b}}_\mathrm{R}&{\mathbf {1}}-{\mathbf {x}}/\ell&{\mathbf {x}}/\ell \end{array}\right] \Sigma ^{-1} \left[ \begin{array}{c}{\mathbf {b}}_\mathrm{L}^{{\mathsf {T}}}\\ {\mathbf {b}}_\mathrm{R}^{{\mathsf {T}}}\\ \beta _\mathrm{L}({\mathbf {1}}-{\mathbf {x}}/\ell )^{{\mathsf {T}}}\\ \beta _\mathrm{R}{\mathbf {x}}^{{\mathsf {T}}} /\ell \end{array}\right] \end{aligned}$$
(22)

where \({\mathbf {1}}=[1\ 1\ 1\ \ldots \ 1]^{{\mathsf {T}}}\) and \({\mathbf {x}}=h[0\ 1\ 2\ \ldots \ n]^{{\mathsf {T}}}\), and where

$$\begin{aligned} G_2=\left[ \begin{array}{ccc}0&{}\vec {0}^{{\mathsf {T}}}&{}0\\ \vec {0}&{}{\bar{A}}^{-1}&{}\vec {0}\\ 0&{}\vec {0}^{{\mathsf {T}}}&{}0\end{array}\right] ,&{\mathbf {b}}_\mathrm{L}\equiv {\mathbf {1}}-{\mathbf {x}}/\ell -G_2{\mathbf {d}}_\mathrm{L},&{\mathbf {b}}_\mathrm{R}\equiv {\mathbf {x}}/\ell +G_2{\mathbf {d}}_\mathrm{R}. \end{aligned}$$
(23)

Furthermore, \( \Sigma \) in (22) is a \(4\times 4\)-matrix

$$\begin{aligned} \Sigma =\left[ \begin{array}{cccc}\sigma _\mathrm{L}+\tau _\mathrm{L}\xi _\mathrm{L}&{}-\tau _\mathrm{R}\xi _\mathrm{C}&{} 0&{}0\\ -\tau _\mathrm{L}\xi _\mathrm{C}&{}\sigma _\mathrm{R}+\tau _\mathrm{R}\xi _\mathrm{R}&{}0&{}0\\ \delta _\mathrm{L}&{}0&{}\alpha _\mathrm{L}+\beta _\mathrm{L}/\ell &{}-\beta _\mathrm{L}/\ell \\ 0&{}\delta _\mathrm{R}&{}- \beta _\mathrm{R}/\ell &{} \alpha _\mathrm{R}+\beta _\mathrm{R}/\ell \end{array}\right] \end{aligned}$$
(24)

that depends on \(\alpha _\mathrm{L,R}\) and \(\beta _\mathrm{L,R}\), that is on the choices of boundary conditions in (11), on the choices of penalty parameters \(\sigma _\mathrm{L,R}\) and \(\tau _\mathrm{L,R}\) in (12) and on the duality parameters \(\delta _\mathrm{L,R}\) in (19), as well as on the scalars

$$\begin{aligned} \xi _\mathrm{L}\equiv - {\mathbf {d}}_\mathrm{L}^{{\mathsf {T}}} {\mathbf {b}}_\mathrm{L},&\xi _\mathrm{R}\equiv {\mathbf {d}}_\mathrm{R}^{{\mathsf {T}}}{\mathbf {b}}_\mathrm{R}&\xi _\mathrm{C}\equiv {\mathbf {d}}_\mathrm{L}^{{\mathsf {T}}}{\mathbf {b}}_\mathrm{R}=-{\mathbf {d}}_\mathrm{R}^{{\mathsf {T}}}{\mathbf {b}}_\mathrm{L}. \end{aligned}$$
(25)

Proof of Theorem 3.1

The proof is given in “Appendix B”. \(\square \)

Note that the quantities in (23), and thus the validity of Theorem 3.1, rely on the existence of \({\bar{A}}^{-1}\). In “Appendix D”, the explicit values of \({\bar{A}}^{-1}\), as well as of \(G_2\), \({\mathbf {b}}_\mathrm{L,R}\), \(\xi _\mathrm{L,R}\) and \(\xi _\mathrm{C}\), are provided for the (2,0), (2,1) and (4,2) order accurate narrow-stencil operators and the (2,0) order accurate wide-stencil operator. This directly proves the existence of \({\bar{A}}^{-1}\) for these operators. Higher order accurate operators have free parameters, but empirically we can draw the conclusion that \({\bar{A}}^{-1}\) must exist at least for the parameter choices in [25], since the operators therein have been applied successfully for many years.

Given the existence of \({\bar{A}}^{-1}\), we note that \({\widetilde{A}}\) in (22) is singular if and only if \( \Sigma \) in (24) is singular. The matrix \( \Sigma \) is in turn singular if any of the two relations

$$\begin{aligned}&(\alpha _\mathrm{L}+\beta _\mathrm{L}/\ell )( \alpha _\mathrm{R}+\beta _\mathrm{R}/\ell )-\beta _\mathrm{L}\beta _\mathrm{R}/\ell ^2=0 \end{aligned}$$
(26)
$$\begin{aligned}&(\sigma _\mathrm{L}+\tau _\mathrm{L}\xi _\mathrm{L})(\sigma _\mathrm{R}+\tau _\mathrm{R}\xi _\mathrm{R})-\tau _\mathrm{L}\tau _\mathrm{R}\xi _\mathrm{C}^2=0 \end{aligned}$$
(27)

holds. The first condition is related to the continuous boundary conditions, and makes the matrix singular if Neumann boundary conditions are imposed on both boundaries, i.e. if \(\alpha _\mathrm{L}=\alpha _\mathrm{R}=0\). The second condition has to do with the choice of penalty parameters, and leads us to the following corollary of Theorem 3.1:

Corollary 3.2

The matrix \({\widetilde{A}}\), described in (15), is singular when the penalty parameters simultaneous fulfill \(\sigma _\mathrm{L}=-\left( \xi _\mathrm{L}+\zeta |\xi _\mathrm{C}|\right) \tau _\mathrm{L}\) and \(\sigma _\mathrm{R}=-\left( \xi _\mathrm{R}+|\xi _\mathrm{C}|/\zeta \right) \tau _\mathrm{R}\), where \(\zeta \ne 0\). If \(\xi _\mathrm{C}\), \(\tau _\mathrm{L}\) or \(\tau _\mathrm{R}\) is zero, the matrix \({\widetilde{A}}\) is singular if either \(\sigma _\mathrm{L}=-\tau _\mathrm{L}\xi _\mathrm{L}\) or if \(\sigma _\mathrm{R}=-\tau _\mathrm{R}\xi _\mathrm{R}\).

Proof of Corollary 3.2

We make the ansatz \(\sigma _\mathrm{L,R}=-\tau _\mathrm{L,R}\xi _\mathrm{L,R}-\varepsilon _\mathrm{L,R}\) with some unknown scalars \(\varepsilon _\mathrm{L,R}\). Inserting this into (27) above gives \(\varepsilon _\mathrm{L}\varepsilon _\mathrm{R}=\tau _\mathrm{L}\tau _\mathrm{R}\xi _\mathrm{C}^2\) which is fulfilled for all pairs \(\varepsilon _\mathrm{L}=\tau _\mathrm{L}|\xi _\mathrm{C}|\zeta \) and \(\varepsilon _\mathrm{R}=\tau _\mathrm{R}|\xi _\mathrm{C}|/\zeta \) with arbitrary choices of \(\zeta \ne 0\). If \(\xi _\mathrm{C}\), \(\tau _\mathrm{L}\) or \(\tau _\mathrm{R}\) is equal to zero, it is enough if either \(\varepsilon _\mathrm{L}=0\) or \(\varepsilon _\mathrm{R}=0\). \(\square \)

The requirements on A and \({\mathbf {d}}_\mathrm{L,R}\) in Theorem 3.1 are only that A is symmetric, that \({\bar{A}}^{-1}\) exists (as discussed above) and that \(D_2\) and \({\mathbf {d}}_\mathrm{L,R}\) in (13) are consistent such that the relations (43) and (44) in “Appendix B” holds. In addition we will assume that \(D_2\) is constructed such the left and right boundary closures are equivalent. This implies that A is a centrosymmetric matrix, that is \(A_{i,j} = A_{n-i,n-j}\) for all \(0 \le i,j \le n\), and that \(({\mathbf {d}}_\mathrm{L})_i = -({\mathbf {d}}_\mathrm{R})_{n-i}\) for \(0 \le i \le n\). This additional assumption leads to \(\xi _\mathrm{L}=\xi _\mathrm{R}\) (this is easiest seen by expressing the quantities in (25) as \(\xi _\mathrm{L,R}= 1/\ell +{\mathbf {d}}_\mathrm{L,R}^{\mathsf {T}}G_2{\mathbf {d}}_\mathrm{L,R}\) and \(\xi _\mathrm{C}=1/\ell +{\mathbf {d}}_\mathrm{L,R}^{\mathsf {T}}G_2{\mathbf {d}}_\mathrm{R,L}\) and thereafter using the fact that the inverse of a centrosymmetric matrix is also centrosymmetric). For later reference we define

$$\begin{aligned} \xi _\mathrm{T}\equiv \xi _\mathrm{L,R}+|\xi _\mathrm{C}|, \end{aligned}$$
(28)

and assume that the penalty is chosen to be equally strong on both boundaries:

Assumption 3.3

Choosing an equal penalty strength on both boundaries corresponds to having \(\zeta =1\) in Corollary 3.2. If in addition equivalent boundary closures are assumed, such that \(\xi _\mathrm{L}=\xi _\mathrm{R}\), we can use \(\xi _\mathrm{T}\equiv \xi _\mathrm{L,R}+|\xi _\mathrm{C}|\) from (28). This simplifies the condition of singularity in Corollary 3.2 to \(\sigma _\mathrm{L,R}=-\xi _\mathrm{T}\tau _\mathrm{L,R}\).

Remark 3.4

The inverse of \({\widetilde{A}}\) mimics a fundamental solution. For example, the Green’s function \({\mathcal {G}}\) of Poisson’s equation, \(-u_{xx}=f\) with \(u(0)=u( \ell )=0\), is

$$\begin{aligned} u(x)&=\int _0^{ \ell }{\mathcal {G}}(x,y)f(y)\,\mathrm {d} y,&{\mathcal {G}}(x,y)=\left\{ \begin{array}{ll}y(1-x/\ell ),&{}\quad y<x,\\ x(1-y/\ell ),&{}\quad x\le y.\end{array}\right. \end{aligned}$$

Recalling that the matrix \(H\) has the role of a quadrature rule, we see the clear similarity to the time-independent, homogeneous version of (14), \({\mathbf {v}}={\widetilde{A}}^{-1}H\mathbf {f}\). The resemblance is more obvious if the penalty dependent part in (22) is ignored, since then \({\mathbf {v}}=G_2H\mathbf {f}\). For the second order accurate approximation given in (64), \(G_2\) is exact in the grid points, as

$$\begin{aligned} \left( G_2\right) _{i,j}=\left\{ \begin{array}{ll} x_j(1-x_i/\ell ),&{}\quad 0\le j\le i\le n,\\ x_i(1-x_j/\ell ),&{}\quad 0\le i \le j\le n. \end{array}\right. \end{aligned}$$

This is identical with the result noted for the classical finite difference method using injection instead of SAT, compare [4, 28]. With Robin boundary conditions we have

$$\begin{aligned} u(x)&=\int _0^{ \ell }{\mathcal {G}}(x,y)f(y)\,\mathrm {d} y+c_\mathrm{L}(1-x/\ell )+c_\mathrm{R}x/\ell \end{aligned}$$

where \(c_\mathrm{L,R}\) depends on the type and data of the boundary conditions from (11), as

$$\begin{aligned} \left[ \begin{array}{c}c_\mathrm{L}\\ c_\mathrm{R}\end{array}\right] =\left[ \begin{array}{cc}\alpha _\mathrm{L}+\beta _\mathrm{L}/\ell &{}\quad -\beta _\mathrm{L}/\ell \\ -\beta _\mathrm{R}/\ell &{}\quad \alpha _\mathrm{R}+\beta _\mathrm{R}/\ell \end{array}\right] ^{-1}\left[ \begin{array}{c}g_\mathrm{L}+\beta _\mathrm{L}\int _0^\ell \left( 1-y/\ell )\right) f(y)\,\mathrm {d} y\\ g_\mathrm{R}+\beta _\mathrm{R}\int _0^{\ell }(y/\ell )f(y)\,\mathrm {d} y\\ \end{array}\right] . \end{aligned}$$

The discrete counterpart is still \({\mathbf {v}}={\widetilde{A}}^{-1}H\widetilde{\mathbf{f}}\), which, using relations in Theorem 3.1 and “Section B.1” of Appendix and with \(\widetilde{\mathbf{f}}=\mathbf {f}-H^{-1}( \sigma _\mathrm{L}{\mathbf {e}}_\mathrm{L}-\tau _\mathrm{L}{\mathbf {d}}_\mathrm{L})g_\mathrm{L}-H^{-1}(\sigma _\mathrm{R}{\mathbf {e}}_\mathrm{R}+\tau _\mathrm{R}{\mathbf {d}}_\mathrm{R}) g_\mathrm{R}\), can be written

where

$$\begin{aligned} \left[ \begin{array}{c}\eta _\mathrm{L}\\ \eta _\mathrm{R}\end{array}\right]&=\left[ \begin{array}{cc}\sigma _\mathrm{L}+\tau _\mathrm{L}\xi _\mathrm{L}&{}\quad -\tau _\mathrm{R}\xi _\mathrm{C}\\ -\tau _\mathrm{L}\xi _\mathrm{C}&{}\quad \sigma _\mathrm{R}+\tau _\mathrm{R}\xi _\mathrm{R}\end{array}\right] ^{-1}\left[ \begin{array}{c}{\mathbf {b}}_\mathrm{L}^{\mathsf {T}}\\ {\mathbf {b}}_\mathrm{R}^{\mathsf {T}}\end{array}\right] H\mathbf {f}. \end{aligned}$$

Unless \(\mathbf {f}=0\), such that \(\eta _\mathrm{L,R}=0\), the numerical solution \({\mathbf {v}}\) differs depending on the choice of penalty parameters, where the vectors \({\mathbf {1}}\), \({\mathbf {x}}\) and \({\mathbf {b}}_\mathrm{L,R}\) span the possible perturbations. As long as choices resulting in \(\sigma _\mathrm{L,R}+\xi _\mathrm{T}\tau _\mathrm{L,R}\approx 0\) are avoided, this perturbation is slight.

3.3 Relations Between Stability, Singularity and Dual Consistency

We take a look at the relation between the stability requirements on the scheme (12) and the conditions that make its discretization matrix singular. First, we note that:

Theorem 3.5

Consider \(\gamma \) in (17) and \(\xi _\mathrm{T}\) in (28). It holds that \(h\gamma =1/\xi _\mathrm{T}\).

Proof

Theorem 3.5 is proven in “Section C.1” of Appendix. \(\square \)

A consequence of Theorem 3.5 is that the stability demands in (20) can be written

$$\begin{aligned} \sigma _\mathrm{L,R}\alpha _\mathrm{L,R}\le 0,&\tau _\mathrm{L,R}\beta _\mathrm{L,R}\le 1/\xi _\mathrm{T},&\delta _\mathrm{L,R}^2\le -4 \alpha _\mathrm{L,R}(\sigma _\mathrm{L,R}/\xi _\mathrm{T}+\tau _\mathrm{L,R}), \end{aligned}$$
(29)

with \(\delta _\mathrm{L,R}\) from (19). We will see that the penalty can be chosen such that we have energy stability and a singular discretization matrix at the same time: from Assumption 3.3 we know that the matrix \({\widetilde{A}}\) is singular when \(\sigma _\mathrm{L,R}=-\tau _\mathrm{L,R}\xi _\mathrm{T}\). Inserting this into (29), the third stability demand becomes \(\delta _\mathrm{L,R}^2\le 0\), which is only fulfilled if the penalty parameters are chosen in a dual consistent way. This means that if (12) is an energy stable scheme, it must also be dual consistent to risk having a singular discretization matrix. Note though that even if the scheme is dual consistent, a singular discretization matrix is avoided by choosing \(\sigma _\mathrm{L,R}\ne -\tau _\mathrm{L,R}\xi _\mathrm{T}\). To be precise, simultaneous having \(\sigma _\mathrm{L,R}=-\xi _\mathrm{T}/(\beta _\mathrm{L,R}\xi _\mathrm{T}+\alpha _\mathrm{L,R})\) and \(\tau _\mathrm{L,R}=1 /(\beta _\mathrm{L,R}\xi _\mathrm{T}+\alpha _\mathrm{L,R})\) should be avoided, since this particular choice makes \(\delta _\mathrm{L,R}=0\), fulfills the stability demands but at the same time makes \({\widetilde{A}}\) singular.

In Assumption 3.3, one can argue that \(\zeta =-1\) gives just as an equal penalty strength as \(\zeta =1\), simplifying Corollary 3.2 to \(\sigma _\mathrm{L,R}=-\left( \xi _\mathrm{L,R}-|\xi _\mathrm{C}|\right) \tau _\mathrm{L,R}\). However, these choices do not give energy stability and are therefore not interesting for our further discussions. Besides, \(|\xi _\mathrm{C}|\) tend to be very small so in practice it does not make much of a difference.

3.4 Relations to the Stability Demands in [13]

In Sect. 3.1.1 the “borrowing technique” is used for deriving the stability restrictions on the penalty parameters. In [13], a different approach (inspired by [3, 18] where wide-stencil discretizations are rewritten as first order systems) is used for showing stability, and here we are going to comment on some connections between the two methods.

In [13], it is assumed that A can be decomposed as in [7], that is as

$$\begin{aligned} A=A^{\mathsf {T}}=S^{\mathsf {T}}MS,&{\mathbf {d}}_\mathrm{L}=S^{\mathsf {T}}{\mathbf {e}}_\mathrm{L},&{\mathbf {d}}_\mathrm{R}=S^{\mathsf {T}}{\mathbf {e}}_\mathrm{R}, \end{aligned}$$
(30)

and the strategy for showing stability is to modify the approximation of \(u_x\) from \(S{\mathbf {v}}\) to the auxiliary variable \({\mathbf {w}}=S{\mathbf {v}}+M^{-1}{\mathbf {e}}_\mathrm{L}\rho _\mathrm{L}+M^{-1}{\mathbf {e}}_\mathrm{R}\rho _\mathrm{R}\). In [13], \(\rho _\mathrm{L,R}\) are penalty-like terms proportional to the solution deviations from boundary data, but other options are possible. Computing \({\mathbf {w}}^{\mathsf {T}}M{\mathbf {w}}\) makes the terms

$$\begin{aligned} 2{\mathbf {v}}^{\mathsf {T}}{\mathbf {d}}_\mathrm{L}\rho _\mathrm{L}+2{\mathbf {v}}^{\mathsf {T}}{\mathbf {d}}_\mathrm{R}\rho _\mathrm{R}+q_\mathrm{L}\rho _\mathrm{L}^2 +2q_\mathrm{C}\rho _\mathrm{L}\rho _\mathrm{R}+q_\mathrm{R}\rho _\mathrm{R}^2 \le 2{\mathbf {v}}^{\mathsf {T}}({\mathbf {d}}_\mathrm{L}\rho _\mathrm{L}+{\mathbf {d}}_\mathrm{R}\rho _\mathrm{R}) +q_\mathrm{T}(\rho _\mathrm{L}^2 +\rho _\mathrm{R}^2 ) \end{aligned}$$

available to the boundary terms in (16), where \(q_\mathrm{L,R}\), \(q_\mathrm{C}\) and \(q_\mathrm{T}\) are defined as

$$\begin{aligned} q_\mathrm{L,R}\equiv {\mathbf {e}}_\mathrm{L,R}^{\mathsf {T}}M^{-1}{\mathbf {e}}_\mathrm{L,R},&q_\mathrm{C}\equiv {\mathbf {e}}_\mathrm{L}^{\mathsf {T}}M^{-1}{\mathbf {e}}_\mathrm{R}={\mathbf {e}}_\mathrm{R}^{\mathsf {T}}M^{-1}{\mathbf {e}}_\mathrm{L},&q_\mathrm{T}\equiv q_\mathrm{L,R}+|q_\mathrm{C}|. \end{aligned}$$
(31)

The “borrowing technique” on the other hand, makes the terms \( -h\gamma {\mathbf {v}}^{\mathsf {T}}({\mathbf {d}}_\mathrm{L}{\mathbf {d}}_\mathrm{L}^{\mathsf {T}}+{\mathbf {d}}_\mathrm{R}{\mathbf {d}}_\mathrm{R}^{\mathsf {T}}){\mathbf {v}}\) available for the boundary terms in (16).

Although these two approaches of showing stability are different, they are closely related. In Lemma 3.6 we formalize this relation and show that \(q_\mathrm{T}=1/(h\gamma )\).

Lemma 3.6

Assume that A in (13) can be factorized as in (30) with \(M>0\), and define \(q_\mathrm{T}\) as stated in (31). Next, consider (17), where the parameter \(\gamma \) is defined as the maximum number such that \({\tilde{A}}_\gamma \ge 0\) still holds. Then it holds that \(h\gamma =1/q_\mathrm{T}\).

Proof

Lemma 3.6 is proven in “Section C.2” of Appendix. \(\square \)

For wide-stencil operators, \(S=D_1\) and \(M=H\) in (30), and the parameters \(q_\mathrm{L,R}\) and \(q_\mathrm{C}\) in (31) are easily obtained since M is known. For narrow-stencil operators on the other hand, M and the interior of S are not uniquely defined. In [13], the strategy was (under the contrary assumption that S is non-singular and M is singular) to compute

$$\begin{aligned} {\widetilde{q}}_\mathrm{L,R}\equiv {\mathbf {e}}_\mathrm{L,R}^{\mathsf {T}}{\widetilde{M}}^{-1}{\mathbf {e}}_\mathrm{L,R},&{\widetilde{q}}_\mathrm{C}\equiv {\mathbf {e}}_\mathrm{L}^{\mathsf {T}}{\widetilde{M}}^{-1}{\mathbf {e}}_\mathrm{R}={\mathbf {e}}_\mathrm{R}^{\mathsf {T}}{\widetilde{M}}^{-1}{\mathbf {e}}_\mathrm{L},&{\widetilde{q}}_\mathrm{T}\equiv {\widetilde{q}}_\mathrm{L,R}+|{\widetilde{q}}_\mathrm{C}| \end{aligned}$$
(32)

instead, where \({\widetilde{M}}\equiv S^{-{\mathsf {T}}}(A+p{\mathbf {e}}_\mathrm{L}{\mathbf {e}}_\mathrm{L}^{\mathsf {T}})S^{-1}\) with \(p\ne 0\) being a perturbation parameter. For wide-stencil operators though, it can easily be checked numerically that \(q_\mathrm{L,R}\ne {\widetilde{q}}_\mathrm{L,R}\) and \(q_\mathrm{C}\ne {\widetilde{q}}_\mathrm{C}\). This is somewhat alarming, but it can as easily be checked that it still holds that \(q_\mathrm{T}={\widetilde{q}}_\mathrm{T}\). We confirm this analytically in Theorem 3.8 below, and the use of \({\widetilde{q}}_\mathrm{T}\) in [13] is thus justified. First though, we note the following:

Lemma 3.7

The quantities \({\widetilde{q}}_\mathrm{L,R}\) and \({\widetilde{q}}_\mathrm{C}\) defined in (32) are identical to the quantities \(\xi _\mathrm{L,R}\) and \(\xi _\mathrm{C}\) in (25).

Proof

Lemma 3.7 is proven in “Section C.3” of Appendix. \(\square \)

Thus, in summary, we have that:

Theorem 3.8

Assume that A in (13) can be factorized as in (30) with \(M>0\), and define \(q_\mathrm{T}\) as stated in (31). Next, assume that M is singular instead, with \(M\ge 0\), and define \({\widetilde{q}}_\mathrm{T}\) as stated in (32). Then it holds that \(q_\mathrm{T}={\widetilde{q}}_\mathrm{T}\).

Proof

From Lemma 3.6 we have that \(q_\mathrm{T}=1/(h\gamma )\) and from Theorem 3.5 we have that \(1/(h\gamma )=\xi _\mathrm{T}\). Combining Lemma 3.7 with the definitions in (32) and (28) we deduce that \(\xi _\mathrm{T}={\widetilde{q}}_\mathrm{T}\). All in all, this gives \(q_\mathrm{T}=1/(h\gamma )=\xi _\mathrm{T}={\widetilde{q}}_\mathrm{T}\) concluding the proof. \(\square \)

For an example, see the derived values of \({\widetilde{q}}_{\mathrm{L,R},\mathrm{C}}\) and \(q_{\mathrm{L,R},\mathrm{C}}\) for the wide-stencil (2,0) order operator in “Section D.4” of Appendix. As a numerical confirmation, in Table 1 we compare the values of \(h{\widetilde{q}}_\mathrm{T}\) from [13] to the values of \(\gamma \) computed in [24, 32]. In Table 1 though, it appears that \(h{\widetilde{q}}_\mathrm{T}\ge 1/\gamma \). This is because the listed \(\gamma \) are computed for \(n\rightarrow \infty \), and are as such slightly too large for very coarse meshes.

Table 1 The borrowing parameter \(\gamma \) computed in [24, 32], for narrow-stencil second derivative operators from [23, 25]

4 Conclusions

We discretize the scalar advection equation and the heat equation in one-dimensional space, using the SBP-SAT finite difference method. This gives rise to two semi-discrete schemes of the form \({\mathbf {v}}_t+L{\mathbf {v}}=\widetilde{\mathbf{f}}\), where the discretization matrix \(L\) is approximating either the first derivative or the second derivative, including treatment of the boundary conditions. The matrix \(L\) is, due to properties of the SBP-SAT method, associated with a positive definite matrix \(H\) such that \(L=H^{-1}K\), where the inverse of \(K\) is interpreted as a discrete Green’s function. We derive the general forms of these inverses, and provide explicit examples of \(K^{-1}\) for some operators \(L\) of second and fourth order accuracy.

The boundary treatment SAT induces free parameters in \(L\). We first determine these parameters such that the semi-discrete schemes are energy stable. Any remaining degrees of freedom can be used to make the schemes dual consistent. Another important question is whether the discretization matrices \(L\) are invertible. Conveniently, the formula for \(K^{-1}\) reveals precisely which combinations of SAT parameters that make \(L\) singular.

In the second derivative case, it turns out that for one very particular choice of SAT parameters, \(L\) can become singular even when the scheme is energy stable. Here, we can avoid this and instead choose the parameters such that the scheme is energy stable, dual consistent and guaranteed to have an invertible discretization matrix (and consequently a unique solution). However, for more complex problems it might not be feasible to prove that the discretization matrix is invertible, not even for energy stable schemes.

Last, we take a look at two supposedly different approaches of proving energy stability. Curiously, they are closely related, leading to the same demands on the SAT parameters.