1 Introduction

We study several continuous finite element formulations to approximate the solution of the two dimensional hyperbolic conservation laws

$$\begin{aligned} \partial _t u (x,t) + \nabla \cdot f(u(x,t)) = 0 \quad x\in \Omega \subset \mathbb {R}, \, t\in \mathbb {R}^+, \end{aligned}$$
(1)

where \(\Omega \subset \mathbb {R}^2\) is the domain, \(f:\mathbb {R}^D\rightarrow \mathbb {R}^{2\times D}\) is the flux function and \(u:\Omega \rightarrow \mathbb {R}^D\) is the unknown of the system of equations.

The largest part of the paper is dedicated to the two-dimensional spectral analysis of different stabilized approaches applied to the scalar (\(D=1\)) transport equations obtained for

$$\begin{aligned} f(u(x,t)) = \textbf{a} u(x,t)\, \qquad \textbf{a} \in \mathbb {R}^2 . \end{aligned}$$
(2)

One of the main objectives of this paper is to identify strategies to build (linearly) stable fully explicit high order continuous finite element schemes to discretize (1) on triangulations of the spatial domain \(\Omega \). To this end we will vary the basis functions, the stabilization technique and the time discretization. In general, the standard Finite Element Method (FEM) derived by this approach require the inversion of a large sparse mass matrix. This procedure can be expensive as either inverting the mass matrix and, hence, the matrix multiplications must be repeated for every time step or the linear solver must be applied at each time step. Various techniques have been introduced to overcome the mass matrix inversion while keeping the high order accuracy of the scheme.

The first strategy we study is the one proposed in [1]. In the reference it is suggested to combine mass lumping with a deferred correction (DeC) iterative time integration method allowing to introduce appropriate corrections in the right–hand side in order to recover the original order of accuracy. This approach can only be used in combination with finite elements whose basis functions have positive integrals. Another approach is based on a careful choice of approximation points defining sufficiently accurate quadrature formulas with all positive weights. If the variational form is evaluated with this underlying quadrature, as in spectral element methods, we obtain a diagonal mass matrix without loosing the order of accuracy. We refer to this case as cubature elements [40]. For this choice, the classical use of Runge–Kutta methods will provide the high order accuracy also for the time discretization.

Secondly, we will study the influence of the stabilization strategy. When solving (1) with continuous finite elements some additional stabilization operator is necessary to enforce the \(\mathbb L_2\) stability. Several stabilization techniques can be devised to introduce a level of dissipation comparable to that of discontinuous Galerkin methods with upwind fluxes [46, 47]. Three approaches will be studied: the streamline upwind Petrov–Galerkin (SUPG) stabilization [12, 18], which is strongly consistent, but it is also introduces new terms in the mass matrix; the continuous interior penalty (CIP) method [14, 16, 19], consisting in adding edge penalty terms proportional to the jump of the first derivative of the solution; the orthogonal subscale stabilization (OSS) [23], a term that penalizes the \(\mathbb {L}_2\) projection of the gradient of the error within the elements. As the CIP stabilization, this technique does not affect the mass matrix, but it requires the solution of another linear system for the \(\mathbb {L}_2\) projection. In this respect, the choice of the finite element space and of the quadrature have enormous impact on the cost of the method. Note that the strategy to impose boundary conditions also plays a major role in ensuring stability [4, 5], but this will not be considered here.

Our objective is to perform a fully discrete spectral analysis on triangulations of the spatial domain to characterize the stability and accuracy of different combinations of approximation, quadrature, stabilization, and time stepping. In the linear case, this allows to propose optimal values of the CFL and stabilization parameters. Moreover, we analyze a further non-linear high order diffusion operator that can be used to stabilize discontinuities and to provide extra stability to the schemes that show to be unstable with the previous techniques. Numerical simulations for both linear and non-linear scalar problems, and for the shallow water system confirm the theoretical results, and allow to further investigate the impact of the discretization choices on the performance of the schemes and on their cost.

The paper is organized as follows. In Sect. 2 we describe the continuous Galerkin discretization, the stabilization techniques, the basis functions and the time integration techniques. In Sect. 3 we introduce the Fourier analysis space definitions that lead to von Neumann analysis, we discuss some technical details on the passage from physical functions to Fourier modes for different meshes and we find the parameters for which the schemes are stable for some mesh configurations. In Sect. 3.9, we also propose to introduce a viscosity term in order to enforce stability when the previous von Neumann analysis reveals instabilities. In Sect. 4 and Sect. 5 we test the found parameters on some linear and nonlinear problems, checking the order of accuracy and the computational times. Finally, in Sect. 6 we derive some conclusions on the presented schemes and possible applications of the found results.

2 Numerical Discretization

In this section we describe the discretization of the hyperbolic conservation law (1). We consider a tessellation of the spatial domain \(\Omega \) consisting of non overlapping (triangular) cells, which we denote by \(\Omega _h\subset \mathbb {R}^2\). The generic element of the tessellation \(\Omega _h\) will be denoted by K, so that \(\Omega _h=\bigcup K\). We denote the set of internal element boundaries (edges) of \(\Omega _h\) by \(\mathscr {F}_h\), using \(\textsf{f}\) for a general element. h denotes the characteristic mesh size of \(\Omega _h\). Despite of the fact that most of the discussion is performed for the scalar case, most of it generalizes readily to systems. If a significant difference arises in this generalization, this will be explicitly discussed.

The discrete solution is sought in a continuous finite element space \(V_h^p = \lbrace v_h \in \mathscr {C}^0 (\Omega _h): \, v_h|K \in \mathbb {P}_p(K),\, \forall K\in \Omega _h \rbrace \). We will use nodal and modal finite elements, and we will denote by \(\varphi _j\) the basis functions associated to the degree of freedom j, so that \(V_h^p=\text {span}\left\{ \varphi _j\right\} _{j\in \Omega _h}\) and we can write

$$u_h(x)=\sum _{j\in \Omega _h} u_j \varphi _j(x),$$

where, with an abuse of notation, with \(j\in \Omega _h\) we mean the set of degrees of freedom with support in \(\Omega _h\). With a similar meaning, we will also use the notation \(j\in K\) to mean the degrees of freedom with support on the cell K.

The unstabilized CG approximation of (1) reads: find \(u_h\in V_h^p\) such that for any \(v_h\in W_h\subset \mathbb {L}_2(\Omega _h):=\lbrace v: \Omega _h \rightarrow \mathbb {R}: \int _{\Omega _h} |v|^2 < \infty \rbrace \)

$$\begin{aligned} \int _{\Omega _h} v_h \partial _t u_h dx - \int _{\Omega _h} \nabla v_h f(u_h)\; dx + \int _{\partial \Omega _h} v_h f(u_h)\cdot \textbf{n} d \Gamma =0, \end{aligned}$$
(3)

where \(\textbf{n}\) is the outward facing normal to the boundary of the domain. The choice of \(W_h\) will be based on \(V_h\), but it may take different forms for different stabilizations.

As already mentioned, we will consider several stabilized variants of (3) which can be all formulated in the form: find \(u_h\in V_h^p\) that satisfies

$$\begin{aligned} \int _{\Omega } v_h ( \partial _t u_h + \nabla \cdot f(u_h)) dx + S(v_h,u_h)=0, \quad \forall v_h \in V^p_h \end{aligned}$$
(4)

where the flux term is written before the integration by part as we will consider only continuous piecewise polynomials approximations, whose derivatives are integrable. Here, S denotes a bilinear stabilization operator defined on \(V^p_h\times V_h^p\). Several different choices for S exist, and are discussed in detail in the following sections.

2.1 Stabilization Terms

2.1.1 Streamline-Upwind/Petrov–Galerkin: SUPG

The SUPG method was introduced in [31] (see also [12, 32] and references therein) and is strongly consistent in the sense that it vanishes when replacing the discrete solution with the exact one. It can be written as a Petrov–Galerkin method replacing \(v_h\) in (3) with a test function belonging to the space

$$\begin{aligned} W_h := \{ w_h:\quad w_h=v_h+\tau _K \nabla _u f(u_h) \cdot \nabla v_h; \quad v_h \in V_h^p \} . \end{aligned}$$
(5)

Here, \(\nabla _u f(u_h) \in \mathbb {R}^{D\times D \times 2}\) is the Jacobian of the flux, D the dimensions of the system, \(\tau _K\) denotes a positive definite stabilization parameter with the dimensions of \(D\times D\) that we will assume to be constant for every element. Although other definitions are possible, here we will evaluate this parameter as

$$\begin{aligned} \tau _K = \delta h_K ( J_K ) ^{-1} \end{aligned}$$
(6)

where \(h_K\) is the cell diameter and \(J_K\) represents the norm of the flux Jacobian on a reference value of the element K. In the scalar case, \(J_K = ||\nabla _u f(u) ||_K\).

The final stabilized variational formulation of (4) reads

$$\begin{aligned} \begin{aligned} \int _{\Omega } v_h \partial _t u_h \; dx&+ \int _{\Omega } v_h \nabla \cdot f(u_h) \; dx \\&+ \underbrace{\sum _{K \in \Omega } \int _{K} \big ( \nabla _u f(u_h) \cdot \nabla v_h \big ) \tau _K \left( \partial _t u_h + \nabla \cdot f(u_h) \right) \; dx}_{S(v_h,u_h)} = 0. \end{aligned} \end{aligned}$$
(7)

The main problem of this stabilization method is that it depends on the time derivative of u and, hence, it does not maintain the structure of the mass matrix in most cases.

To characterize the accuracy of the method, we can use the consistency analysis discussed inter alia in [7, §3.1.1 and §3.2]. In particular, of a finite element polynomial approximation of degree p we can easily show that given a smooth exact solution \(u^e(t,x)\), replacing formally \(u_h\) by the projection of \(u^e\) on the finite element space, we can write

$$\begin{aligned} \begin{aligned} \epsilon (\psi _h)&:= \Big | \int _{\Omega } \psi _h \partial _t (u_h^e - u^e) \; dx - \int _{\Omega } \nabla \psi _h \cdot (\nabla f(u_h^e)-\nabla f(u^e))\; dx \\&\quad \quad + \sum _{K \in \Omega }\sum \limits _{l,m \in K} \dfrac{\psi _l - \psi _m}{k+1} \int _{K} \big (\nabla _u f(u_h) \cdot \nabla \varphi _i) \tau _K \cdot \\&\qquad \left( \partial _t (u_h^e - u^e) + \nabla \cdot ( f(u_h^e) -f(u^e)) \right) \; dx \Big | \le C h^{p+1}, \end{aligned} \end{aligned}$$
(8)

with C a constant independent of h, for all functions \(\psi \) of class at least \(\mathscr {C}^1(\Omega )\), of which \(\psi _h\) denotes the finite element projection. A key point in this estimate is the strong consistency of the method allowing to subtract its formal application to the exact solution (thus subtracting zero), and obtaining the above expression featuring differences between the exact solution/flux and its evaluation on the finite element space. Preserving this error estimate precludes the possibility of lumping the mass matrix, and in particular the entries associated to the stabilization term. This makes the scheme relatively inefficient when using standard explicit time stepping.

As a final note, for a linear flux (2), exact integration, with \(\tau _K = \tau \) and in the time continuous case, a classical result is obtained for homogeneous boundary conditions by testing with \(v_h =u_h + \tau \, \partial _t u_h\) [12]:

$$\begin{aligned} \begin{aligned}&\int \limits _{\Omega _h}\partial _t\left( \dfrac{u^2_h}{2}+\tau ^2\dfrac{(\textbf{a} \cdot \nabla u_h)^2}{2}\right) + \int \limits _{\Omega _h} \textbf{a} \cdot \nabla \left( \dfrac{u^2_h}{2}+\tau ^2\dfrac{( \partial _t u_h)^2}{2}\right) \\&\quad = -\int \limits _{\Omega _h}\tau (\partial _tu_h+\textbf{a} \cdot \nabla u_h)^2. \end{aligned} \end{aligned}$$
(9)

For periodic, or homogeneous boundary conditions, this shows that the norm \(|||u|||^2:=\int _{\Omega _h} \dfrac{u^2_h}{2}+\tau ^2\dfrac{(\textbf{a} \cdot \nabla u_h)^2}{2} dx\) is non-increasing. The interested reader can refer to [12] for the analysis of some (implicit) fully discrete schemes.

2.1.2 Note on the SUPG Technique Applied to Non Scalar Problems

The extension of the SUPG method to a non scalar problem is not straightforward. Here we used the following formulation. First, we define the following system of dimension D:

$$\begin{aligned} \left\{ \begin{array}{ll} &{} \partial _t U + \nabla \cdot \mathscr {F}(U) = \textbf{S}(U) \\ &{} \mathscr {F}=(F_1,F_2) \end{array} \right. \end{aligned}$$
(10)

with \(U \in \mathbb {R}^D\), \(\mathscr {F}(U) \in \mathbb {R}^{2 \times D}\) and \(\textbf{S}(U)\in \mathbb {R}^D \). For example, in the results section we will consider the shallow water equations with \(D =3\) which read

$$\begin{aligned} U=\begin{pmatrix} h \\ hu \\ hv \end{pmatrix} \; F_1(U)= \begin{pmatrix} hu \\ hu^2 +g\frac{h^2}{2} \\ huv \end{pmatrix} \; F_2(U)= \begin{pmatrix} hv \\ huv \\ hv^2 +g\frac{h^2}{2} \end{pmatrix} \; \text{ and } \; \textbf{S}(U)=\begin{pmatrix} 0 \\ -gh b_x \\ -gh b_y \end{pmatrix} \end{aligned}$$

where \( \textbf{S}(U)\) is the source term given by a topography term. Equation (10) can also be written in its quasi-linear form

$$\begin{aligned} \partial _t U + \nabla _U \mathscr {F}(U) \cdot \nabla U = \textbf{S}(U), \end{aligned}$$
(11)

where \(\nabla _U \mathscr {F}(U_h) \in \mathbb {R}^{D\times D \times 2}\) is the Jacobian of the flux \(\mathscr {F}(U_h)\).

Following the definition of the SUPG method and [52, sec. 5], we define a positive-definite stabilization matrix \(\mathbf {\tau _K} \in \mathbb {R}^{D\times D}\) constant for every element K. Here this matrix is evaluated as [52]

$$\begin{aligned} {\tau _K} = \delta h_K \left( \sum _{j\in S_K} \left| \nabla _U \mathscr {F}(\bar{U}_K) \cdot n_j \right| \right) ^{-1}, \end{aligned}$$
(12)

with \(S_K\) the set of vertices of K, and \(n_j\) the outward normal of the edge opposite to the vertex \(j\in S_K\). \(h_K\) is the cell diameter and \(\nabla _u \mathscr {F}(\bar{U}_K)\) represents the flux Jacobian of the the average value of \(U_h\) on the element K.

The SUPG stabilized formulation reads, for each equation of the system \(i=1,\dots ,D\)

$$\begin{aligned} \begin{aligned}&\int _{\Omega } v_h \left( \partial _t U_h + \nabla \cdot \mathscr {F}(U_h)-\textbf{S}(U_h) \right) _{i} +\\&\underbrace{ \left( \sum _{K \in \Omega } \int _{K} \big ( \nabla v_h \cdot \nabla _U \mathscr {F}(U_h) \big ) {\tau _K} \left( \partial _t U_h + \nabla \cdot \mathscr {F}(U_h)-\textbf{S}(U_h) \right) \; dx \right) _{i}}_{S(v_h,U_h)_i} = 0, \end{aligned} \end{aligned}$$
(13)

where \((V)_{i}\) denotes the i-th component of a vector \(V\in \mathbb {R}^D\).

2.1.3 Continuous Interior Penalty - CIP

Another stabilization technique, which maintains sparsity and symmetry of the Galerkin matrix, is the continuous interior penalty (CIP) method. It was developed by Burman and Hansbo originally in [15] and then in a series of works [14, 16, 19]. It can also be seen as a variation of the method proposed by Douglas and Dupont [26].

The method stabilizes the Galerkin formulation by adding edge penalty terms proportional to the jump of the gradient of the derivatives of the solution across the cell interfaces. The CIP introduces high order viscosity to the formulation, allowing the solution to tend to the vanishing viscosity limit. This term does not affect the structure of the mass matrix. The method reads

$$\begin{aligned} \int _{\Omega _h} v_h \partial _t u_h \; dx + \int _{\Omega _h} v_h \nabla \cdot f(u_h)\; dx+ \underbrace{ \sum _{\textsf{f} \in \mathscr {F}_h} \int _\textsf{f} \tau _\textsf{f} [\![n_\textsf{f} \cdot \nabla v_h]\!] \cdot [\![n_\textsf{f} \cdot \nabla u_h]\!] \; d\Gamma }_{S(v_h,u_h)} = 0, \end{aligned}$$
(14)

where \([\![\cdot ]\!]\) denotes the jump of a quantity across a face \(\mathsf f\), \(n_\textsf{f}\) is a normal to the face \(\mathsf f\) and where \(\mathscr {F}_h\) is the collection of internal boundaries, and \(\textsf{f}\) are its elements. Although other definitions are possible, we evaluate the scaling parameter in the stabilization as

$$\begin{aligned} \tau _\textsf{f} = \delta \,h_\textsf{f}^2 \Vert \nabla _uf\Vert _\textsf{f}, \end{aligned}$$
(15)

where \(\Vert \nabla _uf\Vert _\textsf{f}\) a reference value of the norm of the flux Jacobian on \(\textsf{f}\) and \(h_\textsf{f}\) a characteristic size of the mesh neighboring \(\mathsf f\).

As stated above, a clear advantage of CIP is that it does not modify the mass matrix, resulting in efficient schemes if a mass lumping strategy can be devised. On the other hand, the stencil of the scheme increases as the jump of a degree of freedom interacts with cells which are not next to the degree of freedom itself (up to 2 cells distance). Note that for higher order approximations [17, 38] suggest the use of jumps in higher derivatives to improve the stability of the method. However, here we consider the jump in the first derivatives in order to be able to apply the stability analysis and to study the influence of \(\delta \) on the stability of the method. We note that the results presented herein might be improved by adding stabilization of higher derivatives.

The accuracy of CIP can be assessed with a consistency analysis as discussed in [7, §3.1.1 and §3.2]. This consists in, formally substituting \(u_h\) by the projection onto the finite element polynomial of degree p space of \(u^e\), a given smooth exact solution \(u^e(t,x)\), we can show that for all functions \(\psi \) of class at least \(\mathscr {C}^1(\Omega )\), of which \(\psi _h\) denotes the finite element projection, we have the truncation error estimate

$$\begin{aligned} \begin{aligned} \epsilon (\psi _h)&:= \Big | \int _{\Omega } \psi _h \partial _t (u_h^e - u^e) \; dx - \int _{\Omega } \nabla \psi _h \cdot ( f(u_h^e)- f(u^e))\; dx \\&\quad + \sum \limits _{\textsf{f}\in \mathscr {F}_h} \int \limits _\textsf{f}\tau _\textsf{f} [\![n_f \cdot \nabla \psi _h]\!] \cdot [\![n_f \cdot \nabla (u_h^e-u^e)]\!] \Big | \le C h^{p+1}, \end{aligned} \end{aligned}$$
(16)

with C a constant independent of h. The estimate can be derived from standard approximation results applied to \(u^e_h-u^e\) and to its derivatives, noting that \(\tau _\textsf{f}\) is an \(\mathscr {O}(h^2)\), leading to the aimed order of accuracy.

The symmetry of the stabilization allows to easily derive an energy stability estimate for the space discretized scheme only. In particular, for periodic boundary conditions and a linear flux we can easily show that

$$\begin{aligned} \begin{aligned} \int \limits _{\Omega _h}\partial _t\dfrac{u^2_h}{2}= - \sum \limits _{\textsf{f}\in \mathscr {F}_h}\int \limits _\textsf{f} \tau _\textsf{f} [\![n_f \cdot \nabla u_h]\!]^2, \end{aligned} \end{aligned}$$
(17)

which gives a bound in time on the \(\mathbb {L}_2\) norm of the solution.

Note that for higher than second order it may be relevant to consider additional penalty terms based on higher derivatives (see e.g. [3, 13, 17]). We did not do this in this work.

2.1.4 Orthogonal Subscale Stabilization - OSS

Another symmetric stabilization approach is the Orthogonal Subscale Stabilization (OSS) method. Originally introduced as Pressure Gradient Projection (GPS) in [24] for Stokes equations, it was extended to the OSS method in [11, 23] for different problems with numerical instabilities, such as convection–diffusion–reaction problems. This stabilization penalizes the fluctuations of the gradient of the solution with a projection of the gradient onto the finite element space. The method applied to (3) reads: find \(u_h\in V_h^p\) such that \(\forall v_h \in V_h^p\)

$$\begin{aligned} \left\{ \begin{array}{ll} &{} \int _{\Omega _h}\!\! v_h \partial _t u_h \; dx + \int _{\Omega _h}\!\! v_h \nabla \cdot f(u_h) \; dx + \!\!\underbrace{\sum _{K \in \Omega _h} \int \limits _{K}\! \tau _K \nabla v_h \cdot (\nabla u_h - w_h) \; dx}_{S(v_h ,u_h)}= 0, \\ &{} \int _{\Omega _h} v_h w_h\; dx - \int _{\Omega _h} v_h \nabla u_h\; dx = 0. \end{array} \right. \end{aligned}$$
(18)

For this method, the stabilization parameter is evaluated as

$$\begin{aligned} \tau _K = \delta h_K \Vert \nabla _u f\Vert _K . \end{aligned}$$
(19)

The drawback of this method, with respect to CIP, is the requirement of a matrix inversion to project the gradient of the solution in the second equation of (18). This cost can be alleviated by the choice of elements and quadrature rules if they result in a diagonal mass matrix, as is the case for Cubature elements described below.

As before we can easily characterize the accuracy of this method. The truncation error estimate for a polynomial approximation of degree p reads in this case

$$\begin{aligned} \begin{aligned} \epsilon (\psi _h) := \Big | \int _{\Omega _h}&\psi _h \partial _t (u_h^e - u^e) \; dx - \int _{\Omega _h} \nabla \psi _h \cdot (f(u_h^e)-f(u^e))\; dx \\ +&\sum \limits _{K\in \Omega _h}\tau _K \int \limits _{K} \nabla \psi _h \cdot \nabla ( u^e_h - u^e )\\ +&\sum \limits _{K \in \Omega _h} \tau _K\int \limits _{K} \nabla \psi _h \cdot ( \nabla u^e - w_h^e ) \Big | \le C h^{p+1}, \end{aligned} \end{aligned}$$
(20)

where the last term is readily estimated using the projection error and the boundness of \(\psi _h\) as

$$\begin{aligned} \int _{\Omega _h} \psi _h ( w^e_h-\nabla u^e)\; dx = \int _{\Omega _h} \psi _h (\nabla u_h^e - \nabla u^e ) = \mathscr {O}(h^p). \end{aligned}$$

Finally, for a linear flux, periodic boundaries and taking \(\tau _K=\tau \) constant along the mesh, we can test with \(v_h=u_h\) in the first equation of (18), and with \(v_h=\tau w_h\) in the second one and sum up the result to get

$$\begin{aligned} \begin{aligned} \int \limits _{\Omega _h}\partial _t\dfrac{u^2_h}{2}= - \sum \limits _{K} \int \limits _{K} \tau _K ( \nabla u_h - w_h)^2, \end{aligned} \end{aligned}$$
(21)

which can be integrated in time to obtain a bound on the \(\mathbb {L}_2\) norm of the solution.

The truncation consistency error analysis presented above for the three stabilization terms regards only consistency error, but it does not prove stability and convergence for these schemes. These estimations tell us that the stabilization terms that we introduced are of the wanted order of accuracy and that they are usable to aim at the prescribed order of accuracy. This type of analysis has been already done for multidimensional problems inter alia in [2]. More rigorous proof of error bounds with \(h^{p+\frac{1}{2}}\) estimates can be found in [13] for the CIP. We did not consider in this work projection stabilizations involving higher derivatives.

2.2 Finite Element Spaces and Quadrature Rules

In this section we describe three finite element polynomial approximation strategies used in the paper. In particular, on a triangular element K of \(\Omega _h\), we define in this section the restriction of the basis functions of \(V_h^p\) on each element K, which are polynomials of degree at most p. We denote by \(\{\varphi _1, \ldots , \varphi _N\}\) the basis functions and they will have degree at most p, and their definitions amounts to describe the degrees of freedom, i.e., the dual basis.

2.2.1 Basic Lagrangian Equispaced Elements

On triangles, we consider Lagrange polynomials with degrees at most p:

$$\mathbb P^p=\left\{ \sum _{\alpha +\beta \le p } c_{\alpha ,\beta }x^\alpha y^\beta \right\} .$$

We define the barycentric coordinates \(\lambda _i(x,y)\) which are affine functions on \(\mathbb {R}^2\) satisfying the following relations

$$\begin{aligned} \lambda _i(v_j)=\delta _{ij}, \quad \forall i,j=1,\dots ,3, \end{aligned}$$
(22)

where \(v_j=(x_j,y_j)\) are the vertexes of the triangle and, with an abuse of notation, they can be written in barycentric coordinates as \(v_j=(\delta _{1j},\delta _{2j},\delta _{3j})\). Using these coordinates, we can define the Lagrangian polynomials on equispaced points on triangles. The equispaced points are defined on the intersection of the lines \(\lambda _j=\frac{k}{p}\) for \(k=0,\dots ,p\). A way to define the basis functions corresponding to the point \((x_\alpha ,y_\alpha )=(\alpha _1/p,\alpha _2/p,\alpha _3/p)\) in barycentric coordinates, with \(\alpha _i\in \{ 0,\dots , p\}\) and \(\sum _i \alpha _i =1\), is in Algorithm 1.

figure a

The polynomials so defined in a triangle form a partition of unity, but they have also negative values. This leads to negative or zero values of their integrals. This is problematic for some time discretization and we will see why. We will use these polynomials in combination with exact Gauss–Lobatto quadrature formulae for such polynomials and we will refer to them as Basic elements.

2.2.2 Bernstein Polynomials

Bernstein polynomials are as well a basis of \(\mathbb P^p\) but they are not Lagrangian polynomials, hence, there is not a unique correspondence between point values and coefficients of the polynomials. Anyway, there exist a geometrical identification with the Greville points \((x_\alpha ,y_\alpha )= (\alpha _1/p,\alpha _2/p,\alpha _3/p)\). Given a triplet \(\alpha \in \mathbb {N}^3\) with \(\alpha _i\in \llbracket 0,\dots , p\rrbracket \) and \(\sum _i \alpha _i=p\), the Bernstein polynomials are defined as

$$\begin{aligned} \varphi _\alpha (x,y) = p!\prod _{i=1}^3 \frac{\lambda _i^{\alpha _i}(x,y)}{\alpha _i!}. \end{aligned}$$
(23)

Bernstein polynomials satisfy additional properties besides the one already cited for Lagrangian points. As before, they form a partition of unity, the basis functions are nonnegative in any point of the triangle, and so their integrals are strictly positive. More precisely

$$\int _K \varphi _\alpha = \frac{|K|}{S}, \qquad S= \# \left\{ \alpha \in \mathbb {N}^3: |\alpha |_1 = p\right\} . $$

These properties lead also to the fact that the value at each point is a convex combination of the coefficients of the polynomials, so that it is easy to bound minimum and maximum of the function by the minimum and maximum of the coefficients. This has been used in different techniques to preserve positivity of the solution [10, 37]. We will use these polynomials with corresponding high order accurate quadrature formulae. We will denote these elements with the symbol \(\mathbb B^p\) and we refer to them as Bernstein elements.

2.2.3 Cubature Elements

Contrary to the work done in 1D [42], the extension of Legendre–Gauss–Lobatto points which minimize the interpolation error do not exist for the triangle. They have to be computed numerically such as Fekete points [34, 55, 57]. The problem of this approach is that it requires as classical finite elements the inversion of a sparse global mass matrix.

Cubature elements were introduced by G. Cohen and P. Joly in 2001 [25] for the wave equation (second order hyperbolic equation), and are an extension of Lagrange polynomials with the goal of optimizing the underlying quadrature formula error. We will denote the with the symbol \(\tilde{\mathbb P}^p\) and they will be contained in another larger space of Lagrange elements, i.e., \(\mathbb P ^p \subseteq \tilde{\mathbb P}^p \subseteq \mathbb P^{p'}\), with \(p'\) the smallest possible integer. Similar techniques have been used to minimize the interpolation error [34, 55, 57]. The objective of these polynomials is to use the points of the Lagrangian interpolation of the polynomials as quadrature points. This means that the obtained quadrature is \(\int _K f(x,y) = \sum _{\alpha } \omega _\alpha f(x_\alpha ,y_\alpha )\), where \(\int _K \varphi _\alpha = \omega _\alpha \) and \(\varphi _\alpha (x_\beta ,y_\beta ) = \delta _{\alpha \beta }\). This approach can be considered an extension of the Gauss–Lobatto quadrature in 1D for non Cartesian meshes. The biggest advantage of this approach is to obtain a diagonal mass matrix. The drawback is that one needs to increase the number of basis function inside one element to obtain an accurate enough quadrature rule. In our work, we propose to extend this approach to first order hyperbolic equations. A successful extension to elliptic problem is proposed in [51]. A comparison between the equispace repartition and the Cubature repartition for elements of degree \(p=3\) is shown in Fig. 1.

Fig. 1
figure 1

Comparison of the equispace repartition at left and the cubature repartition at right for elements of degree \(p=3\)

For completeness we detail further the construction of the basis functions. The challenges of this approach are the following:

  • Obtain a quadrature which is highly accurate, at least \(p+p'-2\) order accurate [22];

  • Obtain positive quadrature weights \(\omega _\alpha >0\) for stability reasons [58];

  • Minimize the number of basis functions of \(\tilde{\mathbb {P}^p}\);

  • The set of quadrature points has to be \(\tilde{\mathbb {P}}^p\)-unisolvent, so that the DoFs coincide with the quadrature points without ambiguity [33];

  • The number of quadrature points of edges has to be sufficient ensure the conformity of the finite element.

The optimization procedure that lead to these elements consists of several steps where the different goals are optimized one by one. The optimization strategy exploits heavily the symmetry properties that the quadrature point must have.

For \(p=1\) the Cubature elements do not differ from the Basic elements but in the quadrature formula. For \(p=2\) the Cubature elements introduce an other degree of freedom at the center of the triangle, leading to 7 quadrature points and basis functions per element. For \(p=3\) the additional degree of freedom in the triangle are 3, leading to 13 basis functions per triangle. All the details of such elements can be found in [25, 33]. We provide in Sect. 1 the detailed expressions of the polynomials used in this work. We will use the symbol \(\tilde{\mathbb {P}}^p\) and the name Cubature elements to refer to them.

Other elements such as Fekete-Gauss points [29, 50] exist in the literature. They are optimized to interpolate and integrate with high accuracy. However, it is shown that they require more computing time to achieve similar results than cubature points for high order of accuracy.

2.3 Time Integration

The spatial discretization leads to a coupled system of ordinary differential equation which can be written as

$$\begin{aligned} \mathbb {M}\dfrac{dU}{dt} = \texttt{r}(t) \end{aligned}$$
(24)

where U is the vector of all the degrees of freedom on all the domain, \(\mathbb {M}\) and \(\texttt{r}\) are the global mass matrix and right-hand side terms obtained through the discretization of the previous section with some finite elements and stabilization terms. We remark that \(\mathbb {M}\) is diagonal only in the case of the Cubature elements without the SUPG stabilization, while, for all other choices, it is a sparse non–diagonal matrix.

In the following, we describe two different time integration method: explicit Runge–Kutta (RK) methods and their strong stability preserving (SSP) variants; and the Deferred Correction (DeC) algorithm, which avoids the mass matrix inversion through the correction iterations.

2.3.1 Explicit Runge–Kutta and Strong Stability Preserving Runge–Kutta Schemes

Runge–Kutta time integration methods are one step methods consisting in S stages defined by

$$\begin{aligned} \begin{aligned}&U^{(0)}:=U^n,\\&U^{(s)}:=U^n + \Delta t \sum _{j=0}^{s-1}\alpha _j^s \mathbb {M}^{-1} \texttt{r}(U^{(j)}){\text { for }} s=1,\dots , S,\\&U^{n+1}:= U^n + \Delta t \sum _{s=0}^S \beta _s \mathbb {M}^{-1} \texttt{r}(U^{(s)}). \end{aligned} \end{aligned}$$
(25)

Here, we use for the solution the superscript n to indicate the timestep and the superscript in brackets (s) to denote the stage of the method. The coefficients \(\alpha _j^s\) and \(\beta _j^s\) can be defined in many different ways. In particular, we will refer to Heun’s method with RK2, to Kutta’s method with RK3 and the original Runge–Kutta fourth order method as RK4. The respective Butcher tables can be found in Sect. 2 in Table 12, see [20].

A subset of the RK methods are the SSPRK introduced in [56]. They consist in convex combinations of forward Euler steps, and can be rewritten as follows

$$\begin{aligned} \begin{aligned}&U^{(0)}:=U^n,\\&U^{(s)}:=\sum _{j=0}^{s-1} \left( \gamma _j^s U^{(j)} + \Delta t \mu _j^s \mathbb {M}^{-1} \texttt{r}(U^{(j)}) \right) {\text { for }} s=1,\dots , S,\\&U^{n+1}:= U^{(S)} , \end{aligned} \end{aligned}$$
(26)

with \(\gamma _j^s, \mu _j^s\ge 0\) for all \(j,s=1,\dots , S\). We will consider here the second order 3 stages SSPRK(3,2) presented by Shu and Osher in [56], the third order SSPRK(4,3) presented in [54, Page 189], and the fourth order SSPRK(5,4) defined in [54, Table 3]. For complete reproducibility of the results, we put all their Butcher’s tableaux in Sect. 2 in Table 13.

2.3.2 The Deferred Correction Scheme

Deferred Correction methods were introduced in [27] as explicit time integration methods for ODEs, but soon implicit [45], linearly implicit positivity preserving [48] versions and extensions to PDE solvers [1] were studied. In particular, in [1, 3, 6, 8] the DeC is used in a different formulation for finite element methods and it introduces two operator through which it is possible to use a diagonal mass matrix without losing the order of accuracy. This is only achievable when the lumped matrix (defined as the sum on the rows of the full mass matrix) has only positive values on its diagonal. Hence, the use of Bernstein polynomials is recommended in [1], but also Cubature elements can serve the purpose.

Fig. 2
figure 2

Subtimesteps inside the time step \([t^n,t^{n+1}]\)

Consider a discretization of each timestep into M subtimesteps as in Fig. 2. For each subtimestep we define a high order approximation of the integral form of the ODE (24) from \(t^{n,0}\) to \(t^{n,m}\), i.e.,

$$\begin{aligned} \begin{aligned}&\mathbb {M}\left( U^{n,m} -U^{n,0} \right) - \int _{t^{n,0}}^{t^{n,m}} \texttt{r}(U(s)) ds \approx \mathscr {L}^2(\underline{U})^m\\&\mathscr {L}^2(\underline{U})^m:= \mathbb {M}\left( U^{n,m} -U^{n,0} \right) - \Delta t \sum _{z \in \llbracket 0, M \rrbracket } \rho _{z}^m \texttt{r}(U^{n,z}) = 0, \end{aligned} \end{aligned}$$
(27)

with \(\underline{U}=\left( U^{n,0},\dots , U^{n,M} \right) \). Moreover, the quadrature rule in time uses the subtimesteps \(t^{n,m}\) as quadrature points. The corresponding weights \(\rho ^{m}_z\) for every different subinterval are defined by Lagrangian basis functions in these subtimesteps (see [1, 3, 8] for details). The algebraic system \(\mathscr {L}^2(\underline{U}^*)=0\) is in general implicit and nonlinear and, in order not to recast to nonlinear solvers, the DeC procedure approximates the solution of \(\mathscr {L}^2(\underline{U}^*)=0\) by successive iterations relying on a low order easy–to–invert operator \(\mathscr {L}^1\). This operator is typically a first order forward Euler approximation with a lumped mass matrix, i.e.,

$$\begin{aligned} \begin{aligned}&\mathbb {M}\left( U^{n,m} -U^{n,0} \right) - \int _{t^{n,0}}^{t^{n,m}} \texttt{r}(U(s)) ds \approx \mathscr {L}^1(\underline{U})^m\\&\mathscr {L}^1(\underline{U})^m:= \mathbb {D}\left( U^{n,m} -U^{n,0} \right) - \Delta t \beta ^m \texttt{r}(U^{n,0}) = 0. \end{aligned} \end{aligned}$$
(28)

Here, \(\mathbb {D}\) denotes a diagonal matrix obtained from the lumping of \(\mathbb {M}\), i.e., \(\mathbb {D}_{ii}:=\sum _{j} \mathbb {M}_{ij}\), and \(\beta ^m:= \frac{t^{n,m}-t^{n,0}}{t^{n+1}-t^n}\). The values of the coefficients \(\beta ^m\) and \(\rho ^m_z\) for equispaced subtimesteps can be found in Sect. 2. Denoting with the superscript (k) index the iteration step, we describe the DeC algorithm as

$$\begin{aligned}&U^{n,m,(0)}:=U^n&m=0,\dots ,M, \end{aligned}$$
(29a)
$$\begin{aligned}&U^{n,0,(k)}:=U^n&k=0,\dots , K, \end{aligned}$$
(29b)
$$\begin{aligned}&\mathscr {L}^1(\underline{U}^{(k)})=\mathscr {L}^1(\underline{U}^{(k-1)})-\mathscr {L}^2(\underline{U}^{(k-1)})&k=1,\dots , K, \end{aligned}$$
(29c)
$$\begin{aligned}&U^{n+1}:=U^{n,M,(K)}.&\end{aligned}$$
(29d)

It has been proven [1] that if \(\mathscr {L}^1\) is coercive, \(\mathscr {L}^1-\mathscr {L}^2\) is Lipschitz with a constant \(\alpha _1 \Delta t >0\) and the solution of \(\mathscr {L}^2(\underline{U}^*)=0\) exists and is unique, then, the method converges with an error of \(\mathscr {O}(\Delta t^K)\). Hence, choosing \(K=M+1\) we obtain a K-th order accurate scheme.

Relying only on the inversion of the low order operator, the method gets rid of the computational costs of the solution of the linear systems, leaving in the right hand side the mass matrix of the \(\mathscr {L}^2\) operator, that should not be inverted. The only requirement that is necessary for the DeC approach is the invertibility of the lumped mass matrix \(\mathbb {D}\), which limits its application to spatial elements which possess this property. Beyond degree one, basic Lagrange polynomials are not guaranteed to satisfy this property. Hence, only other polynomials as Bernstein and Cubature can be used in combination with DeC.

Finally, for the following analysis we note that the DeC method can be cast in a form similar to a Runge–Kutta method by rewriting (29c) as

$$\begin{aligned} U^{n,m,(k+1)}=U^{n,m,(k)} - \mathbb {D}^{-1} \mathbb {M}\left( U^{n,m,(k)}-U^{n,0,(k)}\right) +\sum _{j=0}^M \Delta t \rho _{j}^m \mathbb {D}^{-1}\texttt{r}(U^{n,j,(k)}). \end{aligned}$$
(30)

Comparing with the system of equations (26), we can immediately define the SSPRK coefficients associated to DeC as \(\gamma ^{m,(k+1)}_{m,(k)}=\mathbb {I}-\mathbb {D}^{-1} \mathbb {M}\) with \(\mathbb {I}\) the identity matrix, \(\gamma ^{m,(k+1)}_{0,(0)}=\mathbb {D}^{-1} \mathbb {M}\), \(\mu ^{m,(k+1)}_{r,(k)}=\rho ^m_r\) for \(m,r=0,\dots ,M\) and \(k=0,\dots ,K-1\) and instead of the mass matrix, we use the diagonal one.

Remark 1

(DeC with SUPG) The iterative procedure of the DeC method overcomes the difficulties presented by some implicit stabilization methods such as SUPG. Indeed, the SUPG stabilization term can be added only to the \(\mathscr {L}^2\) operator, keeping the high order accuracy of this operator. Since the \(\mathscr {L}^2\) operator is applied to the previously computed iteration, all the terms of the SUPG, included the time derivative of u in (7), can be explicitly computed on \(U^{(k-1)}\), keeping then the diagonal mass matrix for the whole scheme.

3 Fourier Analysis

3.1 Preliminaries and Time Continuous Analysis

In order to study the stability and the dispersion properties of the previously presented numerical schemes, we will perform a dispersion analysis on the linear advection problem with periodic boundary conditions:

$$\begin{aligned} \partial _t u(t,\textbf{x}) + \textbf{a} \cdot \nabla u(t,\textbf{x}) = 0, \quad \textbf{a}\in \mathbb {R}^2, \quad (t,\textbf{x}) \in \mathbb {R}^+ \times \Omega , \end{aligned}$$
(31)

with \(\Omega = [0,1]\times [0,1]\). For simplicity, we consider \(\textbf{a} = (\cos ( \Phi ), \sin ( \Phi ))\) with \(\Phi \in [0,2\pi ]\). We then introduce the ansatz

$$\begin{aligned}&u_h(\textbf{x} , t) = Ae^{i(\textbf{k} \cdot \textbf{x} - \xi t)} = Ae^{i(\textbf{k}\cdot \textbf{x}-\omega t)}e^{\epsilon t} \end{aligned}$$
(32)
$$\begin{aligned} \text{ with } \quad&\xi = \omega + i \epsilon , \quad i=\sqrt{-1}, \quad \textbf{k}=(k_x,k_y)^T. \end{aligned}$$
(33)

Here, \(\epsilon \) denotes the damping rate, while the wavenumbers are denoted by \(\textbf{k}=(k_x,k_y)\), with \(k_x=2\pi /L_x\) and \(k_y=2\pi /L_y\) with \(L_x\) and \(L_y\) the wavelengths in x and y directions respectively. The phase velocity \(\textbf{c}\) can be defined from

$$\begin{aligned} \textbf{c}\cdot \textbf{k} = \omega \end{aligned}$$
(34)

and represents the celerity with which waves propagate in space. It is in general a function of the wavenumber. Substituting (32) in the advection equation (31) for an exact solution we obtain that

$$\begin{aligned} \omega = \textbf{k}\cdot \textbf{a} \,,\quad \textbf{c} = \textbf{a} \quad \text{ and } \quad \epsilon = 0. \end{aligned}$$
(35)

In other words

$$\begin{aligned} u_h(\textbf{x} , t) = Ae^{i\textbf{k} \cdot (\textbf{x} - \textbf{a} t)} \,. \end{aligned}$$
(36)

The objective of the next sections is to provide the semi- and fully-discrete equivalents of the above relations for the finite element methods introduced earlier. We will consider polynomial degrees up to 3, for all combinations of stabilization methods and time integration techniques. This will also allow to investigate the parametric stability with respect to the time step (through the CFL number) and stabilization parameter \(\delta \). In practice, for each choice we will evaluate the accuracy of the discrete approximation of \(\omega \) and \(\epsilon \), and we will provide conditions for the non-positivity of the damping \(\epsilon \), i.e., the von Neumann stability of the method.

3.2 The Eigenvalue System

The Fourier analysis for numerical schemes on the periodic domain is based on a discrete Parseval theorem. Thanks to this theorem, we can study the amplification and the dispersion of the basis functions of the Fourier space. The key ingredient of this study is the repetition of the stencil of the scheme from one cell to another one. In particular, using the ansatz (32) we can write local equations coupling degrees of freedom belonging to neighbouring cells through a multiplication by factors \(e^{i\theta _x}\) and \(e^{i\theta _y}\) representing the shift in space along the oscillating solution. The dimensionless coefficient

$$\begin{aligned} \theta _x:= k_x\Delta x\, \text { and }\, \theta _y:= k_y\Delta y \end{aligned}$$
(37)

are the discrete reduced wave numbers which naturally appear all along the analysis. Here, \(\Delta x\) and \(\Delta y\) are defined by the size of the elementary periodic unit that is highlighted with a red square as an example in Fig. 3.

Formally replacing the ansatz in the scheme we end up with a dense algebraic problem of dimension \(N_{dof}\), where \(N_{dof}\) is the number of all the degrees of freedom in the mesh. The obtained system with dimension \(N_{dof}\) in the time continuous case reads

$$\begin{aligned}{} & {} \text {Equations (31) and (32)} \quad \Rightarrow \quad - i\xi \mathbb {M}\textbf{U} + \textbf{a} \cdot ( \mathscr {K}_x \textbf{U},\mathscr {K}_y \textbf{U}) + \delta \mathbb {S}\textbf{U} = 0 \end{aligned}$$
(38)
$$\begin{aligned}{} & {} (\mathbb {M})_{ij} = \int _{\Omega } \phi _i \phi _j dx, \qquad (\mathscr {K}_{x})_{ij} = \int _{\Omega } \phi _i \partial _x \phi _j dx, \qquad (\mathscr {K}_{y})_{ij} = \int _{\Omega } \phi _i \partial _y \phi _j dx, \end{aligned}$$
(39)

with \(\phi _j\) being any finite element basis functions, \( \textbf{U} \) the array of all the degrees of freedom and \(\mathbb {S}\) being the stabilization matrix defined through one of the stabilization techniques of Sect. 2.1. Although system (3839) is in general a global eigenvalue problem, we can reduce its complexity by exploiting more explicitly the ansatz (32). The choice of the mesh is crucial in order to exploit the ansatz and to find a unit block that repeats periodically in space. Hence, we must consider structured periodic meshes and we will focus, in particular, on two types of meshes. The first one is the X-mesh that is depicted in Fig. 3 and the second one is the T-mesh depicted in Fig. 4. In those pictures also the distribution of some \(\mathbb {P}_2\) elements are represented as an example.

More precisely, as it is done in [55] we can introduce elemental vectors of unknowns \(\widetilde{\textbf{U}}_{Z_{ij}}\), where \(Z_{ij}\) is the stencil denoted by the red square in Fig. 3, which repeats periodically on the domain. So that \(\widetilde{\textbf{U}}_{Z_{ij}}\), for continuous finite elements, is an array of d degrees of freedom inside a periodic unitary block \(Z_{ij}\), excluding two boundaries (one on the top and one on the right for example). This number depends on the chosen (periodic) mesh type and on the elements. As an example, in Fig. 3 we display for the X type mesh the periodic elementary unit (in the red square) with Basic and cubature degrees of freedom with \(p=2\). In the X mesh for Basic elements \(p=2\) we have \(d=8\), while for Cubature \(p=2\) we have \(d=12\). Using the periodicity of the solution and the ansatz (32) and denoting by \(Z_{i\pm 1,j\pm 1}\) the neighboring elementary units, we can write the neighboring degrees of freedom by

$$\begin{aligned} \widetilde{\textbf{U}}_{Z_{i\pm 1,j}} = e^{\pm \theta _x}\widetilde{\textbf{U}}_{Z_{i,j}}, \qquad \widetilde{\textbf{U}}_{Z_{i,j\pm 1}} = e^{\pm \theta _y}\widetilde{\textbf{U}}_{Z_{i,j}}, \end{aligned}$$
(40)

and by induction all other degrees of freedom of the mesh.

Fig. 3
figure 3

The X type triangular mesh. At left, the Basic finite element discretisation with \(\mathbb {P}_2\) elements. At right, the grid configuration for \(\tilde{\mathbb {P}}_2\) Cubature elements. The red square represents the periodic elementary unit that contains the degrees of freedom of interest for the Fourier analysis Color figure online

Fig. 4
figure 4

The T type triangular mesh with degrees of freedom in blue and periodic unit in the red square for the Fourier analysis. (color figure online)

This allows to show that the system (3839) is equivalent to a compact system of dimension d (we drop the subscript \(_K\) as they system is equivalent for all cells)

$$\begin{aligned} -i\xi \widetilde{\mathbb {M}} \widetilde{\textbf{U}} + a_x \widetilde{\mathscr {K}}_x \widetilde{\textbf{U} } +a_y \widetilde{\mathscr {K}}_y \widetilde{\textbf{U} } + \delta \widetilde{\mathbb {S}} \widetilde{\textbf{U} } =0, \end{aligned}$$
(41)

where the matrices \(\widetilde{\mathbb {M}}\), \( \widetilde{\mathscr {K}}_x\), \( \widetilde{\mathscr {K}}_y\) and \(\widetilde{\mathbb {S}}\) are readily obtained from the elemental discretization matrices by using Equations (40).

For the discrete Parseval theorem, we know that the norm or the reduced variable \(\widetilde{\textbf{U}}\) is equivalent to the norm of the discrete vector \(\textbf{U}\). Hence, studying the amplification factor of the two is equivalent.

We apply the same analysis to stabilized methods. The interested reader can access all 2D dispersion plots online [43]. From the plot we can see that the increase in polynomial degree provides the expected large reduction in dispersion error, while retaining a small amount of numerical dissipation, which permits the damping of parasite modes.

An example of dispersion curves is given in Fig. 5. The method used Cubature \(\tilde{\mathbb {P}}_2\) elements, the CIP stabilization technique, and a wave angle \(\theta = 5 \pi / 4\). We here show all 12 parasite modes (see Fig. 3). The principal mode of this system is represented in green. This figure also show the complexity of the analysis because of the number of modes to consider.

Fig. 5
figure 5

Dispersion curves related to the 12 modes of \(\widetilde{\textbf{U}}_{Z_{ij}}\) of the system given by Cubature \(\tilde{\mathbb {P}}_2\) elements, the CIP stabilization technique, and a wave angle \(\theta = 5 \pi / 4\) on an X mesh. Phases \(\omega \) (left) and amplifications \(\epsilon \) (right)

We summarize the number of modes for the X mesh in Table 1. A representation of each mesh is done in Sect. 1 for element of degree \(p=2\) and 3.

Table 1 X mesh: Summary table of number of modes per system

3.3 The Fully Discrete Analysis

We analyze now the fully discrete schemes obtained using the RK, SSPRK and DeC time marching methods. Let us consider as an example the SSPRK schemes. If we define as \(A:=\mathbb {M}^{-1} (a_x\mathscr {K}_x+a_y\mathscr {K}_y+\delta {\mathbb {S}})\) we can write the schemes as follows

$$\begin{aligned} \left\{ \begin{array}{ll} \textbf{U}^{(0)}:= &{} \textbf{U}^n, \\ \textbf{U}^{(s)} := &{} \sum _{j=0}^{s-1} \left( \gamma _{sj} \textbf{U}^{(j)} + \Delta t \mu _{sj} A \textbf{U}^{(j)} \right) {\text { for } } s \in \llbracket 1,S \rrbracket ,\\ \textbf{U}^{n+1}:=&{}\textbf{U}^{(S)}. \end{array} \right. \end{aligned}$$
(42)

Expanding all the stages, we can obtain the following representation of the final stage:

$$\begin{aligned} \textbf{U}^{n+1} = \textbf{U}^{(0)} + \sum _{j=1}^{S} \nu _{j} \Delta t^jA^j \textbf{U}^{(0)} = \left( \mathscr {I} + \sum _{j=1}^{S} \nu _{j}\Delta t^j A^j \right) \textbf{U}^n, \end{aligned}$$
(43)

where coefficients \(\nu _j\) in (43) are obtained as combination of coefficient \(\gamma _{sj}\) and \(\mu _{sj}\) in (42) and \(\mathscr {I} \) is the identity matrix. For example, coefficients of the fourth order of accuracy scheme RK4 are \(\nu _1=1\), \(\nu _2 = 1/2\), \(\nu _3=1/6\) and \(\nu _4 = 1/24\).

We can now compress the problem proceeding as in the time continuous case. In particular, using Equations (40) one easily shows that the problem can be written in terms of the local \(d\times d\) matrices \(\widetilde{A}:= \widetilde{\mathbb {M}}^{-1}\left( a_x\widetilde{\mathscr {K}_x}+a_y\widetilde{\mathscr {K}_y}+\delta \widetilde{\mathbb {S}} \right) \) and in particular that

$$\begin{aligned} \widetilde{\textbf{U}}^{n+1} = G \widetilde{\textbf{U}}^{n}\quad \text {with}\quad G:= \left( \widetilde{\mathscr {I}} + \sum _{j=1}^{S} \nu _{j} \Delta t^j \widetilde{A}^j \right) = e^{\epsilon \Delta t } e^{-i\omega \Delta t} , \end{aligned}$$
(44)

where \(G\in \mathbb {R}^{d\times d}\) is the amplification matrix depending on \(\theta ,\,\delta ,\, \Delta t,\, \Delta x\) and \( \Delta y\). Considering each eigenvalue \(\lambda _i\) of G, we can write the following formulae for the corresponding phase \(\omega _i\) and damping coefficient \(\epsilon _i\)

$$\begin{aligned} {\left\{ \begin{array}{ll} e^{\epsilon _i \Delta t } \cos (\omega _i \Delta t) = \text {Re}(\lambda _i) ,\\ - e^{\epsilon _i \Delta t } \sin (\omega _i \Delta t) = \text {Im}(\lambda _i), \end{array}\right. } \Leftrightarrow \, {\left\{ \begin{array}{ll} \omega _i\Delta t = \arctan \left( \frac{-\text {Im}(\lambda _i)}{\text {Re}(\lambda _i)} \right) ,\\ (e^{\epsilon _i \Delta t })^2 = \text {Re}(\lambda )^2 + \text {Im}(\lambda )^2, \end{array}\right. } \\ \Leftrightarrow {\left\{ \begin{array}{ll} \dfrac{\omega _i}{k} = \arctan \left( \frac{-\text {Im}(\lambda _i)}{\text {Re}(\lambda _i)} \right) \frac{1}{k \Delta t},\\ \epsilon _i = \log \left( | \lambda _i | \right) \frac{1}{\Delta t}. \end{array}\right. } \end{aligned}$$

For the DeC method we can proceed with the same analysis transforming also the other involved matrices into their Fourier equivalent ones. Using (30) these terms would contribute to the construction of G not only in the \(\widetilde{A}\) matrix, but also in the coefficients \(\nu _j\), which become matrices as well. At the end we just study the final matrix G and its eigenstructure, whatever process was needed to build it up.

The matrix G describes one timestep evolution of the Fourier modes for all the d different types of degrees of freedom. The damping coefficients \(\epsilon _i\) indicate if the modes are increasing or decreasing in amplitude and the phase coefficients \(\omega _i\) describe the phases of such modes.

We remark that a necessary condition for stability of the scheme is that \( |\lambda _i | \le 1\) or, equivalently, \(\epsilon _i \le 0 \) for all the eigenvalues. The goal of our study is to find the largest CFL number for which the stability condition is fulfilled and such that the dispersion error is not too large.

For our analysis, we focus on the X type triangular mesh in Fig. 3 with elements of degree 1, 2 and 3. This X type triangular mesh is also used in [39] for Fourier analysis of the acoustic wave propagation system.

3.4 Methodology

The methodology we explain in the following, will be applied to all the combination of schemes we presented above (in time: RK, SSPRK and DeC, discretisation in space: Basic, Cubature and Bernstein, stabilization techniques: CIP, OSS and SUPG), in order to find the best coefficients (CFL, \(\delta \)), as in [42].

It must be remarked that the dispersion analysis must satisfy the Nyquist stability criterion, i.e., \(\Delta x_{max} \le \frac{L}{2}\) with \(\Delta x_{max}\) the maximal distance between two nodes on edges. In other words, \(k_{max} = \frac{2\pi }{L_{min}} = \frac{2\pi }{2 \Delta x_{max}}=\frac{\pi }{\Delta x_{max}}\). This tells us where k should vary, i.e., \(k \in {[} 0,\pi / \Delta x_{max} ]\).

The goal of this section is to minimize the dispersion error and guarantee stability, varying the stabilization parameter and the CFL number. Hence, we look for an algorithm that provides these optimal values. With the notation of [42], we will set for the different stabilizations

$$\begin{aligned} \begin{aligned} \quad \text {OSS :} \;\;&\tau _K =\delta \Delta x | a |,\\ \quad \text {CIP :} \;\;&\tau _f = \delta \Delta x^2 | a |,\\ \quad \text {SUPG :}\;\;&\tau _K =\delta \Delta x/|a|. \end{aligned} \end{aligned}$$

One of our objectives is to explore the space of parameters (CFL,\(\delta \)), and to propose criteria allowing to set these parameters to provide the most stable, least dispersive and least expensive methods. A clear and natural criterion is to exclude all parameter values for which there exists at least a wavenumber \(\theta \) or an angle \(\Phi \in [0,2\pi ]\) such that we obtain an amplification of the mode, i.e., \(\epsilon (\theta )>10^{-12}\) (taking into account the machine precision errors that might occur). Doing so, we obtain what we will denote as stable area in \((\text {CFL},\theta )\) space. For all the other points we propose 3 strategies to minimize a combination of dispersion error and computational cost.

In the following we describe the strategy we adopt to find the best parameters couple (CFL,\(\delta \)) that minimizes a global solution error, denoted by \(\eta _u\), while maximizing the CFL in the stable area. In particular, we start from the relative square error of u

$$\begin{aligned} \left| \frac{u(t)-u_{ex}(t)}{u_{ex}(t)}\right| ^2=&\left| e^{\epsilon t - i t(\omega -\omega _{ex})}-1\right| ^2\end{aligned}$$
(45)
$$\begin{aligned} =&\left[ e^{\epsilon t}\cos (t(\omega -\omega _{ex}))-1\right] ^2+\left[ e^{\epsilon t}\sin (t(\omega -\omega _{ex}))\right] ^2\end{aligned}$$
(46)
$$\begin{aligned} =&e^{2\epsilon t} - 2 e^{\epsilon t} \cos (t(\omega -\omega _{ex})) +1. \end{aligned}$$
(47)

Here, we denote with \(\epsilon \) and \(\omega \) the damping and phase of the principal mode and with \(\omega _{ex}=\textbf{k} \cdot \textbf{a}\) the exact phase. For a small enough dispersion error \(|\omega -\omega _{ex} |\ll 1\), we can expand the cosine in the previous formula in a truncated Taylor series as

$$\begin{aligned} \left| \frac{u(t)-u_{ex}(t)}{u_{ex}(t)}\right| ^2\approx&\underbrace{\left[ e^{\epsilon t} -1\right] ^2}_{\text {Damping error}} + \underbrace{e^{\epsilon t}t^2 \left[ \omega - \omega _{ex}\right] ^2}_{\text {Dispersion error}}. \end{aligned}$$
(48)

We then compute an error at the final time \(T=1\), over the whole phase domain, using at least 3 points per wave \(0\le k \Delta x_p \le \frac{2\pi }{3}\), with \(\Delta x_p=\frac{\Delta x}{p}\), and p the degree of the polynomials. We obtain the following \(\mathbb {L}_2\) error definition,

$$\begin{aligned} \eta _u(\omega ,\epsilon )^2:= \frac{3}{2\pi } \left[ \int _{0}^{\frac{2\pi }{3}} (e^{\epsilon }-1 )^2 dk + \int _{0}^{\frac{2\pi }{3}} e^\epsilon (\omega -\omega _{ex})^2 dk \right] . \end{aligned}$$
(49)

Recalling that \(\epsilon =\epsilon (k\Delta x,\text {CFL},\delta , \Phi )\) and \(\omega =\omega (k,\Delta x,\text {CFL},\delta , \Phi )\), we need to further set the parameter \(\Delta x_p\). We choose it to be large \(\Delta x_p=1\), with the hope that for finer grids the error will be smaller. Moreover, we need to check that the stability condition holds for all the possible angles \(\Phi \in [0,2\pi ]\).

Finally, we seek for the couple \((\text {CFL}^*,\delta ^*)\) such that

$$\begin{aligned} (\text {CFL}^*,\delta ^*)=\arg \max _{\text {CFL}} \left\{ \eta (\omega , \epsilon , \Phi ')< \mu \min _{\text {stable } (\text {CFL},\delta )} \max _{\Phi } \eta (\omega ,\epsilon , \Phi ), \quad \forall \, \Phi ' \in [0,2\pi ] \,\right\} , \end{aligned}$$
(50)

where the dependence on \(\Phi \) of \(\eta \) is highlighted with an abuse of notation. For this strategy, the parameter \(\mu \) must be chosen in order to balance the requirements on stability and accuracy. After having tried different values, we have set \(\mu \) to 10 providing a sufficient flexibility to obtain results of practical usefulness. Indeed, the found values will be tested in the numerical section.

To show the influence of the angle \(\Phi \) on the optimization problem we show an example for the X mesh. For a given couple of parameters (CFL,\(\delta \)) = (0.4, 0.01) we compare the results for \(\Phi =0\) and \(\Phi =3\pi /16\). In Fig. 6 we compare the phases \(\omega _i\) and the damping coefficients \(\epsilon _i\) for the two angles. It is clear that for the angle \(\Phi =0\), on the left, there are some modes which are not stable \(\epsilon _i>0\), while for \(\Phi =3\pi /16\) all modes are stable.

Fig. 6
figure 6

Comparison of dispersion curves \(\omega _i\) and damping coefficients \(\epsilon _i\), for Cubature \(\tilde{\mathbb {P}}_2\) elements, with SSPRK time discretization and OSS stabilization. \(\Phi =0\) at the left and \(\Phi =3\pi / 16\) at the right

The angle can widely influence the whole analysis as one can observe in the plot of \(\max _i \epsilon _i\) in Fig. 7, where we observe that for the only angle \(\Phi = 3\pi /16\) we would obtain an optimal parameter in (CFL,\(\delta \)) = (0.4, 0.01), while, using all angles, this value is not stable anymore.

Fig. 7
figure 7

Plot of \(\log (\max _i \epsilon _i)\) for Cubature \(\tilde{\mathbb {P}}_2\) elements, SSPRK time discretization and OSS stabilization. The blue and light blue region is the stable one. At the left only for \(\Phi =3\pi /16\), at the right we plot the maximum over all \(\Phi \)

Remark 2

To define the stable region, we should only consider configurations for which the damping is below machine accuracy. In practice, this cannot be done due to the fact that the eigenvalue problem arising from (44) is only solved approximately using the linear algebra package of numpy. This introduces some uncertainty in the definition of the stability region as machine accuracy needs to be replaced by some other finite threshold.

3.5 Results of the Fourier Analysis Using the X Type Mesh

In this section, we illustrate the result obtained with the methodology explained above. For clarity not all the results are reported in this work, however we place all the plots for all possible combination of schemes in an online repository [42]. We will provide some examples here and a summary of the main results that we obtained.

Fig. 8
figure 8

Damping coefficients \(\log (\max _i \epsilon _i)\) for \(\mathbb {B}_3\) Bernstein elements and the DeC method with, from left to right, SUPG, OSS and CIP stabilization. The red dot is the optimum according to (50)

The first type of plot we introduce helps us in understanding how we can define the stability region in the \((\text {CFL},\delta )\) plane. Thus, for every \((\text {CFL},\delta )\) we plot the maximum of \(\log (\epsilon _i)\) over all modes and angles \(\Phi \in [0,2\pi ]\) (thanks to the symmetry of the mesh we can reduce this interval). An example is given in the right plot of Fig. 7, it is clear that the whole blue area is stable and the yellow/orange area is unstable. In other cases, this boundary is not so clear and setting a threshold to determine the stable area can be challenging. In Fig. 8 we compare different stabilizations for DeC with \(\mathbb {B}_3\) elements. In the CIP stabilization case, we clearly see that there is no clear discontinuity between unstable values and stable ones, as in SUPG, because there is a transient region where \(\max _i \epsilon _i\) varies between \(10^{-7}\) and \(10^{-4}\).

The second type of plot combines the chosen stability region with the error \(\eta _u\). We plot on the \((\text {CFL},\delta )\) plane some black crosses on the unstable region, where there exists an i and \(\Phi \) such that \(\epsilon _i > 10^{-7}\). The color represents \(\log (\eta _u)\) and the best value according to the previously described method is marked with a red dot. In Figs. 9, 10, 11 and 12, we show some examples of these plots for some schemes, for different \(p=1,2,3\). In Figs. 9 and 10 we test the Basic elements with the SSPRK time discretization, while in Figs. 11, 12 we use the Cubature elements with DeC time discretization. We compare also different stabilization technique: in Figs. 9 and 11 we use the OSS, while in Figs. 10 and 12 the CIP. One can observe many differences among the schemes. For instance, for \(p=3\) we see a much wider stable area for SSPRK than with DeC and, in the Cubature DeC case, we see that the CIP requires a reduction in the CFL number with respect to the OSS stabilization.

Fig. 9
figure 9

\(\log (\eta _u)\) values (blue scale) and stable area (unstable with black crosses), on \((\text {CFL},\delta )\) plane. The red dot denotes the optimal value. From left to right \(\mathbb {P}_1\), \(\mathbb {P}_2\), \(\mathbb {P}_3\) Basic elements with SSPRK scheme and OSS stabilization. (color figure online)

Fig. 10
figure 10

\(\log (\eta _u)\) values (blue scale) and stable area (unstable with black crosses), on \((\text {CFL},\delta )\) plane. The red dot denotes the optimal value. From left to right \(\mathbb {P}_1\), \(\mathbb {P}_2\), \(\mathbb {P}_3\) Basic elements with SSPRK scheme and CIP stabilization. (color figure online)

Fig. 11
figure 11

\(\log (\eta _u)\) values (blue scale) and stable area (unstable with black crosses), on \((\text {CFL},\delta )\) plane. The red dot denotes the optimal value. From left to right \(\tilde{\mathbb {P}}_1\), \(\tilde{\mathbb {P}}_2\), \(\tilde{\mathbb {P}}_3\) Cubature elements with DeC scheme and OSS stabilization. (color figure online)

Fig. 12
figure 12

\(\log (\eta _u)\) values (blue scale) and stable area (unstable with black crosses), on \((\text {CFL},\delta )\) plane. The red dot denotes the optimal value. From left to right \(\tilde{\mathbb {P}}_1\), \(\tilde{\mathbb {P}}_2\), \(\tilde{\mathbb {P}}_3\) Cubature elements with DeC scheme and CIP stabilization. (color figure online)

We summarize the results obtained by the optimization strategy in Table 2 for all the combinations of spatial, time and stabilization discretization. The CFL and \(\delta \) presented there are optimal values obtained by the process above described, which we aim to use in simulations to obtain stable and efficient schemes. Unfortunately, as already mentioned above, for some schemes the stability area is not so well defined for several reasons. One of these reasons is the "shape" of the stability area as for one-dimensional problems, see [42]. Other issues that affect this analysis are the numerical precision, see Sect. 3.6, and the mesh configuration, see Sect. 3.7. In the following we study more in details these cases and how one can find better values (Fig. 13).

Table 2 X mesh: Optimized CFL and penalty coefficient \(\delta \) in parenthesis, minimizing \(\eta _u\)

3.6 Comparison with a Space-Time Split Stability Analysis

Fig. 13
figure 13

Logarithm of the amplification coefficient \(\log (\max _i (\varepsilon _i))\) for SUPG stabilization with \(\tilde{\mathbb {P}}_3\) Cubature elements and the SSPRK method. Unstable region in yellow, the red dot is the optimal parameter according to (50)

In this section, we show another stability analysis to slightly improve the results obtained above. Indeed, the solution of the eigenvalue problem (44) is only obtained within some approximation from the numpy numerical library. In some cases, the threshold used to define the stability region is defined in a somewhat heuristic manner. So to confirm the results, we use independently another criterion. To this end we treat independently the temporal and spatial discretizations as in the method of lines. We then study only the spectral properties of the spatial discretization alone, computing the eigenvalues of the corresponding matrix A (cf. (42)). With this information, we then check whether they belong to the stability area of the time discretization.

In particular, following [21], we write the time discretization for Dahlquist’s equation

$$\begin{aligned} \partial _t u - \lambda u = 0 , \end{aligned}$$
(51)

in this example, we consider the SSPRK discretization (42). From (43) we can write the amplification coefficient \(\Gamma (\lambda )\), i.e.,

$$\begin{aligned} \textbf{U}^{n+1} = \textbf{U}^{(0)} + \sum _{j=1}^{S} \nu _{j} \Delta t^j\lambda ^j \textbf{U}^{(0)} = \underbrace{ \left( \mathscr {I} + \sum _{j=1}^{S} \nu _{j}\Delta t^j \lambda ^j \right) }_{\Gamma (\lambda )} \textbf{U}^n. \end{aligned}$$
(52)

The stability condition for this SSPRK scheme is given by \(\Gamma (\lambda ) \le 1\). Now, when we substitute the Fourier transform of the spatial semidiscretization \(\widetilde{A}\) to the coefficient \(\lambda \) and we diagonalize the system (or we put it in Jordan’s form), we obtain a condition on the eigenvalues of \(\widetilde{A}\). Then, studying the Cubature case with SUPG stabilization of order 4 with parameters (CFL,\(\delta \))=(0.234, 0.011), found in Fig. 13, see also Table 2, we plot the eigenvalues of \(\widetilde{A}\) and the stability region of the SSPRK scheme for different \(\theta \in [ 0, \pi ]\). We notice that for some values of \(\theta \) some of the eigenvalues fall slightly outside the stable area, see Fig. 14a. There are, indeed, few eigenvalues dangerously close to the imaginary axis and some of them have actually positive real part (blue dots).

Fig. 14
figure 14

Eigenvalues of \(\widetilde{A}\) using cubature discretization and the SUPG stabilization (varying k) and stability area of the SSPRK method. In red the stable eigenvalues, in blue the unstable ones. (color figure online)

As suggested before, if we decrease the CFL and increase \(\delta \), we move towards a safer region, so considering (CFL,\(\delta \))=(0.18, 0.04) with the same \(\theta \), we obtain all stable eigenvalues, as shown in Fig. 14b.

The summary of the optimal parameters of Table 2 updated taking into account also a larger safety region in the (CFL, \(\delta \)) plane (as explained in this section) can be found in Table 15 in Appendix 2.

3.7 Different Mesh Patterns

Another important aspect about this stability analysis is the influence of the mesh structure on the results. As an example, we use the T-mesh, another regular and structured mesh type depicted in Fig. 4. In Fig. 4 we plot also the degrees of freedom for elements of degree 2 and the periodic elementary unit that we take into consideration for the Fourier analysis. The number of modes in the periodic unit for this mesh type are summarized in Table 3. The elements of degree 3 can be found in Fig. 28 in Appendix 1.

Table 3 Number of modes in the periodic unit for different elements in the T mesh

Even if for several methods we observe comparable results for the two mesh types, for some of them the analyses are quite different. An example is given by the Basic elements with SSPRK schemes and CIP stabilization. For this method, we plot the dispersion error (49) and the stability area in Fig. 15a for the X mesh and in Fig. 15b for the T mesh. We see huge differences in \(\mathbb {P}_2\) and \(\mathbb {P}_3\) where in the former a wide region becomes unstable for \(\delta _L\le \delta \le \delta _R\) and for the latter we have to decrease a lot the value of \(\delta \) to obtain stable schemes.

In the case of Cubature elements with the OSS stabilization and SSPRK time integration, we have already seen in the previous section that the optimal parameters found were in a dangerous area. Repeating the stability analysis for the T mesh we see that the situation is even more complicated. In Fig. 16a we plot the analysis for the X mesh and in Fig. 16b the one for the T mesh. \(\tilde{\mathbb {P}}_3\) elements, though being stable for some parameters for the X mesh, are never stable on the T mesh. This means, that, when searching general parameters for the schemes, we have to keep in mind that different meshes leads to different results.

Fig. 15
figure 15

\(\log (\eta _u)\) values (blue scale) and stable area (unstable with black crosses), on \((\text {CFL},\delta )\) plane. The red dot denotes the optimal value. From left to right \(\mathbb {P}_1\), \(\mathbb {P}_2\), \(\mathbb {P}_3\) Basic elements with SSPRK scheme and CIP stabilization. (color figure online)

Fig. 16
figure 16

\(\log (\eta _u)\) values (blue scale) and stable area (unstable with black crosses), on \((\text {CFL},\delta )\) plane. The red dot denotes the optimal value. From left to right \(\tilde{\mathbb {P}}_1\), \(\tilde{\mathbb {P}}_2\), \(\tilde{\mathbb {P}}_3\) Cubature elements with SSPRK scheme and OSS stabilization

For completeness, we present the optimal parameters also for the T mesh in Table 16 in Appendix 2.

In general, it is important to consider more mesh types when doing this analysis. In practice, we will use the two presented above (X and T meshes). In the following, we will consider the stability region as the intersection of stability regions of both meshes.

3.8 Final Results of the Stability Analysis

Fig. 17
figure 17

Maximum logarithm of the amplification coefficient \(\log (\max _i (\varepsilon _i))\) for \(\tilde{\mathbb {P}}_3\) Cubature elements on the X and T meshes

Fig. 18
figure 18

Logarithm of the amplification coefficient \(\log (\max _i (\varepsilon _i))\) for \(\tilde{\mathbb {P}}_3\) Cubature elements on the X mesh

Taking into consideration all the aspects seen in the previous sections, it is important to have a comprehensive result, which tells which parameters can be used in the majority of the situations. A summary of the parameters obtained for the X and T mesh is available in Appendix 2. In Table 4, instead, we present parameters obtained using the most restrictive case among different meshes and that insure a sufficiently large area of stability around them, as explained in Sect. 3.6. These parameters can be safely used in many cases and we will validate them in the numerical sections, where, first, we validate the results of the X mesh on a linear problem on an X mesh, then we used the more general parameters in Table 4 for nonlinear problems on unstructured meshes.

Table 4 Optimized CFL and penalty coefficient \(\delta \) in parenthesis, combining the two mesh configurations

A special remark must be done for Cubature \(\tilde{\mathbb {P}}_3\) elements combined with the OSS and the CIP stabilizations. In Fig. 17 we see how the amplification coefficient \(\max _i \varepsilon _i\) has always values far away from zero. For the CIP stabilization this is always true and even for the \(\tilde{\mathbb {P}}_2\) elements the stability region is very thin. As suggested in [17, 38] higher order derivatives jump stabilization terms might fix this problem, but it introduces more parameters. This has not been considered here. Another remark is that the T configuration is very peculiar and, as we will see, on classical Delauney triangulations the issue seem to not affect the results. Moreover, the use of additional discontinuity capturing operators may alleviate this issue as some additional, albeit small, dissipation is explicitly introduced in smooth regions.

In Sect. 3.9, we propose to add an additional stabilization term for these unstable schemes, i.e., Cubature \(\tilde{\mathbb {P}}_3\) elements and OSS or CIP stabilization techniques. This term is based on viscous term [2, 30, 36, 41] and allows to stabilize numerical schemes for any mesh configuration.

For the OSS stabilization we observe a similar behavior in Fig. 17. The stability that we see in that plot are only due to the the T mesh. Indeed, for the OSS stabilization on the X mesh there exists a corridor of stable values, which turn out to be unstable for the T mesh, see Fig. 18. In practice, also on unstructured grids we have not noticed instabilities when running with the parameters found with the X mesh. Hence, we suggest anyway some values of CFL and \(\delta \) for these schemes, which are valid for the X mesh, noting that they might be dangerous for very simple structured meshes. The validation on unstructured meshes also for more complicated problems will be done in the next sections.

Overall, Table 4 gives some insight on the efficiency of the schemes. We remind that, in general, we prefer matrix free schemes, so this aspect must be kept in mind while evaluating the efficiency of the schemes. All the SUPG schemes, except when with DeC, and all the Basic element schemes have a mass matrix that must be inverted. Among the others we see that for first degree polynomials schemes the DeC with Bernstein polynomials and SUPG stabilization gives one of the largest CFL result, while for second degree polynomials the OSS Cubature SSPRK scheme seems the one with best performance and, for fourth order schemes, again the Bernstein DeC SUPG is one of the best.

In conclusion of this section, there are important points to highlight:

  • The extension of the Fourier analysis to the two-dimensional space leads to significantly different results with respect to the one-dimensional one. Both in terms of global stability of the schemes, and in terms of optimal parameters. Moreover, in opposition to [42], Bernstein elements with SUPG stabilization technique lead to stable and efficient schemes. Cubature elements, which were the most efficient in one-dimensional problems, have stability issues on the two-dimensional mesh topologies studied.

  • The complexity of the analysis in two-dimensional space is increased. This not only implies a larger number of degrees of freedom, but also more parameters to keep into account, including the angle of the advection term and the possible different configuration of the mesh. The visualization of the stability region of the time scheme as shown in Fig. 14 with the eigenvalues of the semi-discretization operators helps in understanding the effect of CFL and penalty coefficient on the stability of the scheme, only for methods of lines. This helps in choosing and optimizing the couple of parameters.

Remark 3

Another possibility to characterize the linear stability of numerical method is proposed by J. Miller [44]. This method is based on the study of the characteristic polynomial of the amplification matrix G. However, this method does not provide information about the phase \(\omega \), since it does not compute eigenvalues of G. For this reason, we choose the eigenanalysis.

3.9 Accounting for Discontinuity Capturing Corrections

The stabilization terms accounted for so far are linear stabilization operators. For more challenging simulations, additional non-linear stabilization techniques might be added to control the numerical solution in vicinity of strong non-linear fronts and/or discontinuities. We consider here the effect of adding an extra viscosity term, as in the entropy stabilization formulations proposed e.g. in [2, 30, 35, 36, 41]. We in particular look at the approach proposed in [30], and used for shallow water waves in [41, 49] and in [9, 28]. In this approach the viscosity is designed to provide a first order correction \(\mu _K=\mathscr {O}(h)\) close to discontinuities, while for smooth enough solutions \(\mu _K = c h^{p+1}\).

Our idea is to embed this high order correction explicitly in the analysis of the previous section to provide a heuristic characterization of the fully discrete stability of the resulting stabilized formulation: find \(u_h\in V_h^p\) that satisfies for any \(v_h\in W_h\)

$$\begin{aligned} \int _{\Omega } v_h ( \partial _t u_h + \nabla \cdot f(u_h)) dx + \underbrace{ S(v_h,u_h)}_{\text{ Diffusive } \text{ term }} + \underbrace{\sum _K \int _K \mu _K(u_h) \nabla v_h \cdot \nabla u_h}_{\text{ Viscosity } \text{ term }} =0. \end{aligned}$$
(53)

3.9.1 Note on the Stability of the Method

As it is done for previous stabilization terms in Sect. 2.1, we can characterize the accuracy of this method estimating the truncation error for a polynomial approximation of degree p. Considering the smooth exact solution \(u^e(t,x)\) of (53), for all functions \(\psi \) of class at least \(\mathscr {C}^1(\Omega )\) of which \(\psi _h\) denotes the finite element projection, we obtain

$$\begin{aligned} \begin{aligned} \epsilon (\psi _h)&:= \Big | \int _{\Omega _h} \psi _h \partial _t (u_h^e - u^e) \; dx - \int _{\Omega _h} \nabla \psi _h \cdot (f(u_h^e)-f(u^e))\; dx \\&\qquad + \sum \limits _{K\in \Omega _h}\mu _K \int \limits _{K} \nabla \psi _h \cdot \nabla ( u^e_h - u^e ) dx \Big | \le C h^{p+1}, \end{aligned} \end{aligned}$$
(54)

with C a constant independent of h. The estimate can be derived from standard approximation results applied to \(u_h^e-u^e\) and to its derivatives, knowing that \(\mu _K = \mathscr {O}(h^{p+1})\).

Then, for a linear flux, periodic boundaries and taking \(\mu _K=\mu \) constant along the mesh, we can test with \(v_h=u_h\) in (53), we get

$$\begin{aligned} \begin{aligned} \int \limits _{\Omega _h} d_t \frac{ u^2_h}{2} = - \sum \limits _{K} \int \limits _{K} \mu ( \nabla u_h )^2 \le 0, \end{aligned} \end{aligned}$$
(55)

which can be integrated in time to obtain a bound on the \(\mathbb {L}_2\) norm of the solution.

3.9.2 The von Neumann Analysis

As we saw in Sect. 3.8, the T mesh configuration has stability issues. In particular, the numerical schemes using Cubature \(\tilde{\mathbb {P}}_3\) elements, SSPRK and DeC time integration methods, and the OSS and the CIP stabilization techniques are unstable. We propose to evaluate these schemes adding the viscosity term in (53). For the von Neumann analysis, we use \(\mu _K(u) = c h_K^{p+1}\) in (53), with \(c \in \mathbb {R}^+\), \(h_K\) the cell diameter and p the degree of polynomial approximation. We show the plot of \(\max _i \epsilon _i\) to understand how the stability region behaves with respect to c using Cubature \(\tilde{\mathbb {P}}_3\) elements. In Fig. 19 the maximum amplification factor \(\epsilon \) is represented for varying c, using the OSS stabilization technique and the SSPRK time integration method. We note that the same behaviour is observed with CIP and DeC. Plots are available online [43].

Fig. 19
figure 19

T mesh: Von Neumann analysis using an additional viscosity term (see (53)). Cubature \(\tilde{\mathbb {P}}_3\) elements with SSPRK and OSS. Comparison of different \(\mu \)

We can observe two main results. First, increasing the parameter c up to around 0.1 allows to expand the stability region. Second, when the viscosity coefficients reaches too high values, it is necessary to decrease the CFL (see Fig. 19c with \(\mu =0.05\) and Fig. 19d with \(\mu =0.5\) as an example).

4 Numerical Verification

We now perform numerical tests to check the validity of our theoretical findings. We initially focus on the structured grids, and in particular on the X mesh configuration, although similar verifications have been performed on the T mesh. We will use elements of degree p, with p up to 3, with time integration schemes of the corresponding order of accuracy to ensure an overall error of \(\mathscr {O}(\Delta x ^{p+1})\), under the CFL conditions discussed earlier (see also Table 15 in Sect. 2). As already stressed, numerical integration is performed with Gauss–Legendre formulae of the appropriate order to exactly integrate the variational form for Basic and Bernstein elements, while for Cubature elements we use those associated to the interpolation points.

The mesh used in the Fourier analysis is the basis of the one we will use in the numerical simulations. We will extend it periodically for the whole domain, see an example in Fig. 20a.

4.1 Linear Advection Equation Test

We start with the linear advection equation 1 on the domain \(\Omega = [0,2]\times [0,1]\) using Dirichlet inlet boundary conditions:

$$\begin{aligned} {\left\{ \begin{array}{ll} \partial _t u (t,\textbf{x}) + \textbf{a}\cdot \nabla u (t,\textbf{x}) = 0, \qquad &{} \quad (t,\textbf{x}) \in [t_0,t_f] \times \Omega , \quad \textbf{a}= (a_x,a_y)^T \in \mathbb {R}^2, \\ u (0,\textbf{x}) = u_0(\textbf{x}), &{} \\ u (t,\mathbf {x_D}) = u_{ex} (t,\mathbf {x_D}), &{} \quad \mathbf {x_D} \in \Gamma _D = \{ (x,y)\in \mathbb {R}^2 ,x \in \{0,2\} \text{ or } y \in \{0,1\} \}, \end{array}\right. } \end{aligned}$$
(56)

where \(u_0((x,y)^T) = 0.1 \cos (2\pi \, r(x,y) )\), with \(r(x,y)=\cos (\theta )x+\sin (\theta )y\) the rotation by an angle \(\theta \) around (0, 0), \(\textbf{a}=(a_x,a_y)^T=(\cos (\theta ), \sin (\theta ) )^T\) and \(\theta =3\pi /16\). The final time of the simulation is \(t_f=2s\).

The exact solution is \(u_{ex}(\textbf{x},t)=u_0(x-a_x\, t,y-a_y\, t )\) for all \(\textbf{x}=(x,y)\in \Omega \) and \(t\in \mathbb {R}^+\). The initial conditions are displayed in Fig. 20b. We discretize the domain with the X mesh pattern, see Fig. 20a. To have approximately the same number of degrees of freedom for different degrees p, we use different mesh sizes for each order of accuracy: \(\Delta x_1 = \{ 0.1, 0.05, 0.025 \}\) for \(\mathbb {P}_1\), \(\Delta x_2 = 2\Delta x_1 \) for \(\mathbb {P}_2\), and \(\Delta x_3 = 3 \Delta x_1 \) for \(\mathbb {P}_3\) elements.

Fig. 20
figure 20

Linear advection simulation on the X mesh

Fig. 21
figure 21

Error decay for linear advection problem with different elements and OSS stabilization and SSPRK time discretization: \(\mathbb {P}_1\) in blue, \(\mathbb {P}_2\) in green and \(\mathbb {P}_3\) in red. (color figure online)

In Fig. 21a, b, we study the error convergence for different schemes. In the x-axis the values of \(\Delta t\) are displayed, which we remind are proportional to \(\Delta x\), and the error is plotted on the y-axis. These figures show a comparison between Cubature and Basic elements with OSS stabilization and SSPRK time integration. As we can see, the two schemes have correct slopes (i.e. correct order of accuracy), and very similar errors except for \(\mathbb {P}^1\) where the larger CFL increases the error. The Basic elements require stricter CFL conditions, see Table 15, and have larger computational costs because of the inversion of the mass matrix.

To show the main benefit of using the Cubature elements (diagonal mass matrix), we plot in Fig. 22 the computational time of Basic and Cubature elements for the SSPRK time scheme and all stabilization techniques.

Fig. 22
figure 22

Error for linear advection problem (56) with respect to computational time for SSPRK time discretization, comparing Basic and Cubature elements and all stabilization techniques

As a first interesting result of numerical test, looking at the Fig. 22, we can clearly see that, for a fixed accuracy, Cubature elements obtain better computational times with respect to Basic elements. Moreover, as expected, the SUPG stabilization technique requires more computational time as it requires the inversion of a mass matrix, even in the case where the CFL used is larger than the ones for OSS or CIP stabilization, see Table 15.

The order of accuracy reached by each simulations is shown in Table 5. The plots and all the errors are available at the repository [43].

Table 5 Convergence order for all schemes on linear advection test, using coefficients obtained in Table 15

Looking at the Table 5, we observe that almost all the stabilized schemes provide the expected order of accuracy. Exception to this rule are several \(\mathbb {P}_2\) discretization which reach an order of accuracy of \(\approx 2.5\), and all Bernstein \(\mathbb {B}_3\) polynomials with the DeC which reach an order of accuracy of 2. This result is very disappointing and it does not improve even adding more corrections, as suggested in [1, 3]. Moreover, it has been independently verified that also in Fourier space the accuracy of DeC with Bernstein polynomials of degree 3 is only of order 2. This problem do not show up for steady problems, as there only the spatial discretization determines the order of accuracy. We will show it in Sect. 5.3, where we study also some steady vortexes. The authors still do not understand why the optimal order of accuracy is not reached. This opens doors to further research on this family of schemes.

Note that we do not show results for Bernstein elements with SSPRK technique because they are identical to Basic elements, but are more expensive because of the projection in the Bernstein element space and the interpolation in the quadrature points.

More comparisons on different grids (unstructured) will be done in Sect. 5.

4.2 Shallow Water Equations

We consider the non linear shallow water equations (no friction and constant topography):

$$\begin{aligned} \left\{ \begin{array}{lll} \partial _t h + \partial _x (hu) + \partial _y (hv) &{} = 0, \qquad \quad &{} x\in \Omega = [0,2]\times [0,1], \\ \partial _t (hu) + \partial _x (hu^2 +g\frac{h^2}{2} ) + \partial _y (huv) &{} =0, &{} t \in [0,t_f]\\ \partial _t (hv) + \partial _x (huv) + \partial _y (hv^2 +g\frac{h^2}{2} ) &{} =0, &{} t_f =1s. \end{array} \right. \end{aligned}$$
(57)

An analytical solution of this system is given by travelling vortexes [53]. We use here a vortex with compact support and in \(\mathscr {C}^6(\Omega )\) described by

$$\begin{aligned} \begin{pmatrix} h(x,t)\\ u(x,t)\\ v(x,t) \end{pmatrix}= {\left\{ \begin{array}{ll} \begin{pmatrix} h_c + \frac{1}{g} \frac{\Gamma ^2}{\omega ^2} \cdot \left( \lambda (\omega \mathscr {R}( \textbf{x},t) ) - \lambda (\pi ) \right) , \\ u_c + \Gamma (1+\cos (\omega \mathscr {R}( \textbf{x},t)))^2 \cdot (- \mathscr {I}(\textbf{x},t)_y), \\ v_c + \Gamma (1+\cos (\omega \mathscr {R}( \textbf{x},t)))^2 \cdot ( \mathscr {I}(\textbf{x},t)_x), \end{pmatrix}, &{}\text{ if } \omega \mathscr {R}( \textbf{x},t) \le \pi ,\\ \begin{pmatrix} h_c &{} u_c &{} v_c \end{pmatrix}^T,&\text{ else, } \end{array}\right. } \end{aligned}$$
(58)

with

$$\begin{aligned} \lambda (r) =&\frac{20\cos (r)}{3} + \frac{27\cos (r)^2}{16} + \frac{4\cos (r)^3}{9} + \frac{\cos (r)^4}{16} + \frac{20r\sin (r)}{3} \\&+ \frac{35r^2}{16} + \frac{27r\cos (r)\sin (r)}{8} + \frac{4r\cos (r)^2 \sin (r)}{3} + \frac{r\cos (r)^3 \sin (r)}{4}. \end{aligned}$$

where \(\mathbf {X_c} = (0.5,0.5)\) is the initial vortex center, \((h_c,\, u_c,\, v_c)=(1.,\,0.6,\,0)\) is the far field state, \( r_0 = 0.45\) is the vortex radius, \(\Delta h = 0.1\) is the vortex amplitude, and the remaining paramters are defined as

$$\begin{aligned} \left\{ \begin{array}{ll} \omega = \pi / r_0 \qquad &{} \text{ angular } \text{ wave } \text{ frequency }, \\ \Gamma = \frac{12 \pi \sqrt{g \Delta h } }{r_0 \sqrt{315 \pi ^2-2048}} \qquad &{} \text{ vortex } \text{ intensity } \text{ parameter }, \\ \mathscr {I}(\textbf{x},t) = \textbf{x} - \mathbf {X_c} - (u_c t,v_c t)^T \qquad &{} \text{ coordinates } \text{ with } \text{ respect } \text{ to } \text{ the } \text{ vortex } \text{ center }, \\ \mathscr {R}( \textbf{x},t) = \Vert \mathscr {I}(\textbf{x},t) \Vert \qquad &{} \text{ distance } \text{ from } \text{ the } \text{ vortex } \text{ center }. \end{array} \right. \end{aligned}$$
(59)

We discretize the mesh with uniform square intervals of length \(\Delta x\) (see Fig. 20a), and as before we perform a grid convergence by respecting the constraint \(\Delta x_2 = 2\Delta x_1 \) for \(\mathbb {P}_2\) elements and \(\Delta x_3 = 3 \Delta x_1 \) for \(\mathbb {P}_3\) elements. Because of the high cost of the SUPG technique, we only compare the OSS and the CIP stabilization techniques. As an example of results, we again show the benefit of using Cubature elements in Fig. 23. We can see that since the dimension of the discretized system is even larger than before (three times larger), the differences between Cubature and Basic elements are even more pronounced in the error-computational time plot.

Fig. 23
figure 23

Error for shallow water system (57) with respect to computational time for SSPRK method with Cubature (left) and Basic (right) elements and CIP and OSS stabilizations

In Table 6 we show the convergence orders for this shallow water problem with the CFL and \(\delta \) coefficients found in Table 15.

Table 6 Convergence order on shallow water, using coefficients obtained in Table 15

The results obtained are similar to those of the linear advection case. We can also notice the \(\mathbb {P}_2\) discretization reaching the proper convergence order, i.e., 3, and Bernstein \(\mathbb {B}_3\) elements reaching an order of accuracy of \(\approx 3\) which is more satisfying than the results obtained for the linear advection test, but still disappointing knowing that we were expecting 4.

5 Simulations on Unstructured Meshes

We now perform numerical tests to check the validity of our theoretical findings using an unstructured mesh, and the most restrictive parameters in Table 4. These parameters make sure that we are stable for both T and X mesh configurations. The results have similar convergence rate to the tests on the structured meshes of the previous section.

The unstructured mesh used in this section is shown in Fig. 24, and it was created by the mesh generator gmsh.Footnote 1

Fig. 24
figure 24

Unstructured mesh on \(\Omega =[0,2]\times [0,1]\)

5.1 Linear Advection Test

Table 7 Convergence order for linear advection on unstructured mesh, using coefficients obtained in Table 4

We use the same test case of Sect. 4.1. Convergence orders for all schemes are summarized in Table 7. We observe that all \(\mathbb {P}_1\) discretizations provide the proper convergence order. For \(\mathbb {P}_2\) discretization we spot a slight reduction of the order of accuracy, which lays for most of the schemes between 2 and \(\approx 2.5\) instead of being 3. For polynomials of degree 3, we observe an order reduction to 2 for the same schemes that lost the right order of accuracy also for X mesh in the previous section. In particular, we have that Bernstein \(\mathbb {B}_3\) polynomials with the DeC result in an order of accuracy of \(\approx 2\) instead of 4, as well as the \(\tilde{\mathbb {P}}_3\) discretization with the combination DeC and SUPG stabilization. As for the X mesh, the Basic \(\mathbb {P}_3\) discretization reach order of accuracy \(\approx 4\) for all stabilization techniques, as well as Cubature \(\tilde{\mathbb {P}}_3\) with SUPG and OSS stabilizations.

Also in this case, the results obtained with \(\tilde{\mathbb {P}}_3\) Cubature elements and OSS stabilization are stable as we can see from the convergence analysis. This might mean that just few unfortunate mesh configurations, as the T one, result in an unstable scheme and that, most of the time, the parameters found in Table 4 are reliable for this scheme. On the other hand, the combination \(\tilde{\mathbb {P}}_3\) and CIP gives an unstable scheme.

Fig. 25
figure 25

Error for linear advection problem (56) with respect to computational time for all elements and stabilization techniques

We compare error and computational time for all methods presented above in Fig. 25. Looking at \(\mathbb {P}_2\) and the \(\mathbb {P}_3\) discretizations, as expected, the mass-matrix free combination, i.e., Cubature elements with SSPRK and OSS, gives smaller computational costs than other combinations with Basic elements. Conversely, the SUPG technique increase the computational costs with respect to all other stabilizations for all schemes. That is why we will not use it for the next test. The plots and all the errors are available at the repository [43].

Remark 4

(Entropy viscosity)

As remarked in Sect. 3.9, we can improve the stability of some schemes (Cubature OSS) with extra entropy viscosity. Here, we test the convergence rate on the T mesh configuration, i.e., the one with more restrictive CFL conditions and most unstable. This test is performed using Cubature \(\tilde{\mathbb {P}}_3\) elements, SSPRK and DeC time integration methods, and the OSS and the CIP stabilization techniques. We solve again problem (56).

Using formulation (53) and tuning stability coefficient \(\delta \), CFL and viscosity coefficient c found in Fig. 19, we obtain fourth order accurate schemes. These tuned coefficients, and the corresponding convergence orders are summarized in Table 8.

Table 8 Convergence order of methods using Cubature \(\tilde{\mathbb {P}}_3\) elements and viscosity term (53) with tuned parameters

Many other formulations of viscosity terms exist in literature and can ensure convergent methods of order \(p+1\) (using \(\mathbb {P}_p\) elements) [30, 36, 41]. The majority use a nonlinear evaluation of the parameter \(\mu _K\), based on the local entropy production.

5.2 Shallow Water Equations

Table 9 Convergence order on shallow water for unstructured mesh, using coefficients obtained in Table 4

In this section we test the proposed schemes on the test case of Sect. 4.2 with the unstructured mesh in Fig. 24. Convergence orders are summarized in Table 9.

Fig. 26
figure 26

Error for shallow water problem (57) with respect to computational time for all elements and stabilization techniques

Also for the shallow water equations, we have results that resemble the ones of the structured mesh. There are small differences in the order of accuracy in both directions in different schemes. Comparing also the computational time of all the schemes in Fig. 26, we can choose what we consider the best numerical method for these test cases: Cubature discretization with the OSS stabilization technique. This performance seems fully provided by the free mass-matrix inversion, as the CFLs for the OSS technique (with SSPRK scheme) is approximately the same between Basic and Cubature elements (see Table 4).

The plots and all the errors are available at the repository [43].

5.3 Remark on the Steady Vortex Case

For completeness we consider now a steady vortex, similarly to what reported in [3] for the isentropic Euler equations. So, we consider again the traveling vortex proposed in Sect. 4.2 with \(t_f =0.1s\). We compare the convergence orders between \(u_c=0\) (steady case) and \(u_c=0.6\) (unsteady case) in Tables 10 and 11. As we can see, in the steady case we obtain, without any additional viscous stabilization, the expected convergence order for all schemes, in particular for the DeC with Bernstein polynomial function. These results agree with the ones in [3]. Comparing with the unsteady case, all the other schemes reach similar order of accuracy as obtained in Table 9. Running the test with additional corrections in DeC scheme, as often suggested in [1, 3], does not improve the convergence order in the unsteady case (even with \(K=50\)).

Table 10 Convergence order for steady vortex, \(t_f=0.1s\)
Table 11 Convergence order for unsteady vortex, \(t_f=0.1s\)

These results show that a numerical error appears in the spatio-temporal integration part of the solution (27), which might be related to the fact that the high order derivatives are never penalized in our stabilizations and might produce some small oscillations.

6 Conclusion

This work shows also that the stability results obtained in the one dimensional analysis [42] can not be generalized for two dimensional problems on triangular meshes. In this direction, it could be interesting to perform the stability analysis on Cartesian quadrilateral meshes, to check whether in that situation the one dimensional results still hold true.

In the numerical test section, the order of accuracy found is not the expected one for all the methods, i.e., \(p+1\) using \(\mathbb {P}_p\) elements. For several cases, we reach only \(p+1/2\) or p. Among the schemes that are stable and with the right order of accuracy, the method that uses Cubature elements with OSS stabilization technique and SSPRK method of order 4 has proven to be the most accurate and less expensive. Secondly, comparing to the SUPG stabilization technique, very often used in the literature for hyperbolic system, we showed that other stabilization techniques such as CIP and OSS can provide the same accuracy and are cheaper in term of computational costs.

In this direction, it would be interesting to evaluate the stability of the CIP adding a additional penalty term on the jump of higher order derivatives as suggested in [3, 13, 17]. Moreover, it could be interesting to see the stability of Cubature elements using higher degree polynomials. Another interesting point to explore is the loss of accuracy obtained using the DeC with Bernstein third order polynomial basis functions for unsteady cases.

Finally, we provided a heuristic approach characterized by additional discontinuity capturing viscous operators such as those proposed in [30, 36]. Even for smooth solutions, the very small additional dissipation introduced by these terms is enough to stabilize some of the symmetric mass-matrix-free approaches, otherwise linearly unstable. This allows to obtain interesting schemes for practical purposes.