1 Introduction

The nonconforming virtual element method approximates the weak solution \(u\in H^1_0(\Omega )\) to the second-order linear elliptic boundary value problem

$$\begin{aligned} {\mathcal {L}} u:=-\text {div}(\mathbf{A} \nabla u + \mathbf{b} u)+\gamma u = f \quad \text{ in }\quad \Omega \end{aligned}$$
(1.1)

for a given \(f\in L^2(\Omega )\) in a bounded polygonal Lipschitz domain \(\Omega \subset {\mathbb {R}}^2\) subject to homogeneous Dirichlet boundary conditions.

1.1 General introduction

The virtual element method (VEM) introduced in [4] is one of the well-received polygonal methods for approximating the solutions to partial differential equations (PDEs) in the continuation of the mimetic finite difference method [7]. This method is becoming increasingly popular [1, 3, 5, 6, 16, 17] for its ability to deal with fairly general polygonal/polyhedral meshes. On the account of its versatility in shape of polygonal domains, the local finite-dimensional space (the space of shape functions) comprises non-polynomial functions. The novelty of this approach lies in the fact that it does not demand for the explicit construction of non-polynomial functions and the knowledge of degrees of freedom along with suitable projections onto polynomials is sufficient to implement the method.

Recently, Beirão da Veiga et al. discuss a conforming VEM for the indefinite problem (1.1) in [6]. Cangiani et al. [17] develop a nonconforming VEM under the additional condition

$$\begin{aligned} 0\le \gamma -\frac{1}{2}\text {div}(\mathbf{b} ), \end{aligned}$$
(1.2)

which makes the bilinear form coercive and significantly simplifies the analysis. The two papers [6, 17] prove a priori error estimates for a solution \(u\in H^2(\Omega )\cap H^1_0(\Omega )\) in a convex domain \(\Omega \). The a priori error analysis for the nonconforming VEM in [17] can be extended to the case when the exact solution \(u\in H^{1+\sigma }(\Omega )\cap H^1_0(\Omega )\) with \(\sigma >1/2\) as it is based on traces. This paper shows it for all \(\sigma >0\) and circumvents any trace inequality. Huang et al. [31] discuss a priori error analysis of the nonconforming VEM applied to Poisson and Biharmonic problems for \(\sigma >0\). An a posteriori error estimate in [16] explores the conforming VEM for (1.1) under the assumption (1.2). There are a few contributions [9, 16, 34] on residual-based a posteriori error control for the conforming VEM. This paper presents a priori and a posteriori error estimates for the nonconforming VEM without (1.2), but under the assumption that the Fredholm operator \({\mathcal {L}}\) is injective.

1.2 Assumptions on (1.1)

This paper solely imposes the following assumptions (A1)–(A3) on the coefficients \(\mathbf{A} , \mathbf{b} , \gamma \) and the operator \({\mathcal {L}}\) in (1.1) with \(f\in L^2(\Omega )\).

  1. (A1)

    The coefficients \(\mathbf{A} _{jk}, \mathbf{b} _{j},\gamma \) for \(j,k=1,2\) are piecewise Lipschitz continuous functions. For any decomposition \({\mathcal {T}}\) (admissible in the sense of Sect. 2.1) and any polygonal domain \(P\in {\mathcal {T}}\), the coefficients \(\mathbf{A} , \mathbf{b} , \gamma \) are bounded pointwise a.e. by \(\Vert \mathbf{A} \Vert _{\infty }, \Vert \mathbf{b} \Vert _{\infty }, \Vert \gamma \Vert _{\infty }\) and their piecewise first derivatives by \(|\mathbf{A} |_{1,\infty },|\mathbf{b} |_{1,\infty }, |\gamma |_{1,\infty }\).

  2. (A2)

    There exist positive constants \(a_0\) and \(a_1\) such that, for a.e. \(x\in \Omega \), \(\mathbf{A} (x)\) is SPD and

    $$\begin{aligned} a_0|\xi |^2 \le \sum _{j,k=1}^{2} \mathbf{A} _{jk}(x)\xi _{j}\xi _{k}\le a_1|\xi |^2\quad \text {for all}\; \xi \in {\mathbb {R}}^2. \end{aligned}$$
    (1.3)
  3. (A3)

    The linear operator \({\mathcal {L}}:H^{1}_0(\Omega )\rightarrow H^{-1}(\Omega )\) is injective, i.e., zero is not an eigenvalue of \({\mathcal {L}}\) .

Since the bounded linear operator \({\mathcal {L}}\) is a Fredholm operator [30, p. 321], (A3) implies that \({\mathcal {L}}\) is bijective with bounded inverse \({\mathcal {L}}^{-1}:H^{-1}(\Omega )\rightarrow H^1_0(\Omega )\). The Fredholm theory also entails the existence of a unique solution to the adjoint problem, that is, for every \(g\in L^2(\Omega )\), there exists a unique solution \(\Phi \in H^1_0(\Omega )\) to

$$\begin{aligned} {\mathcal {L}}^*\Phi :=-\text {div}(\mathbf{A} \nabla \Phi )+\mathbf{b} \cdot \nabla \Phi +\gamma \Phi =g. \end{aligned}$$
(1.4)

The bounded polygonal Lipschitz domain \(\Omega \), the homogeneous Dirichlet boundary conditions, and (A1)–(A2) lead to some \(0<\sigma \le 1\) and positive constants \(C_{\text {reg}}\) and \(C^*_{\text {reg}}\) (depending only on \(\sigma , \Omega \) and coefficients of \({\mathcal {L}}\)) such that, for any \(f,g\in L^2(\Omega )\), the unique solution u to (1.1) and the unique solution \(\Phi \) to (1.4) belong to \(H^{1+\sigma }(\Omega )\cap H^1_0(\Omega )\) and satisfy

$$\begin{aligned} \Vert u\Vert _{1+\sigma ,\Omega }\le C_{\text {reg}}\Vert f\Vert _{L^2(\Omega )}\; \text { and}\;\; \Vert \Phi \Vert _{1+\sigma ,\Omega }\le C^*_{\text {reg}}\Vert g\Vert _{L^2(\Omega )}. \end{aligned}$$
(1.5)

(The restriction \(\sigma \le 1\) is for convenience owing to the limitation to first-order convergence of the scheme.)

1.3 Weak formulation

Given the coefficients \(\mathbf{A} , \mathbf{b} ,\gamma \) with (A1)–(A2), define, for all \(u,v\in V:=H^1_0(\Omega )\),

$$\begin{aligned} a(u,v):=(\mathbf{A} \nabla u,\nabla v)_{L^2(\Omega )},\quad b(u,v):=(u,\mathbf{b} \cdot \nabla v)_{L^2(\Omega )},\quad c(u,v):=(\gamma u, v)_{L^2(\Omega )} \end{aligned}$$
(1.6)

and

$$\begin{aligned} B(u,v):=a(u,v)+b(u,v)+c(u,v) \end{aligned}$$
(1.7)

(with piecewise versions \(a_{\text {pw}}, b_{\text {pw}}, c_{\text {pw}}\) and \( B_{\text {pw}}\) for \(\nabla \) replaced by the piecewise gradient \(\nabla _{\text {pw}}\) and local contributions \(a^P, b^P , c^P\) defined in Sect. 3.1 throughout this paper). The weak formulation of the problem (1.1) seeks \(u\in V\) such that

$$\begin{aligned} B(u,v) = (f,v) \quad \text {for all}\; v \in V. \end{aligned}$$
(1.8)

Assumptions (A1)–(A3) imply that the bilinear form \(B(\cdot ,\cdot )\) is continuous and satisfies an inf-sup condition [11]

$$\begin{aligned} 0<\beta _0:=\inf _{0\ne v\in V}\sup _{0\ne w\in V}\frac{B(v,w)}{\Vert v\Vert _{1,\Omega }\Vert w\Vert _{1,\Omega }}. \end{aligned}$$
(1.9)

1.4 Main results and outline

Section 2 introduces the VEM and guides the reader to the first-order nonconforming VEM on polygonal meshes. It explains the continuity of the interpolation operator and related error estimates in detail. Section 3 starts with the discrete bilinear forms and their properties, followed by some preliminary estimates for the consistency error and the nonconformity error. The nonconformity error uses a new conforming companion operator resulting in the well-posedness of the discrete problem for sufficiently fine meshes. Section 4 proves the discrete inf-sup estimate and optimal a priori error estimates. Section 5 discusses both reliability and efficiency of an explicit residual-based a posteriori error estimator. Numerical experiments in Sect. 6 for three computational benchmarks illustrate the performance of an error estimator and show the improved convergence rate in adaptive mesh-refinement.

1.5 Notation

Throughout this paper, standard notation applies to Lebesgue and Sobolev spaces \(H^m\) with norm \(\Vert \cdot \Vert _{m,{\mathcal {D}}}\) (resp. seminorm \(|\cdot |_{m,{\mathcal {D}}}\)) for \(m>0\), while \((\cdot ,\cdot )_{L^2({\mathcal {D}})}\) and \(\Vert \cdot \Vert _{L^2({\mathcal {D}})}\) denote the \(L^2\) scalar product and \(L^2\) norm on a domain \({\mathcal {D}}\). The space \(C^0(\mathcal {D})\) consists of all continuous functions vanishing on the boundary of a domain \({\mathcal {D}}\). The dual space of \(H^1_0(\Omega )\) is denoted by \(H^{-1}(\Omega )\) with dual norm \(\Vert \cdot \Vert _{-1}\). An inequality \(A\lesssim B\) abbreviates \(A\le CB\) for a generic constant C, that may depend on the coefficients of \({\mathcal {L}}\), the universal constants \(\sigma \), \(\rho \) (from (M2) below), but that is independent of the mesh-size. Let \(\mathcal {P}_k({\mathcal {D}})\) denote the set of polynomials of degree at most \(k\in \mathbb {N}_0\) defined on a domain \({\mathcal {D}}\) and let \(\Pi _k\) denote the piecewise \(L^2\) projection on \(\mathcal {P}_k({\mathcal {T}})\) for any admissible partition \(\mathcal {T}\in \mathbb {T}\) (hidden in the notation \(\Pi _k\)). The notation \(H^s(P):= H^s(\text {int}P)\) for a compact polygonal domain P means the Sobolev space \(H^s\) [30] defined in the interior \(\text {int}(P)\) of P throughout this paper. The outward normal derivative is denoted by \(\frac{\partial \;\bullet }{\partial \mathbf{n} _P}=\mathbf{n} _P\cdot \nabla \bullet \) for the exterior unit normal vector \(\mathbf{n} _P\) along the boundary \(\partial P\) of the domain P.

2 First-order virtual element method on a polygonal mesh

This section describes class of admissible partitions of \(\Omega \) into polygonal domains and the lowest-order nonconforming virtual element method for the problem (1.1) [3, 17].

2.1 Polygonal meshes

A polygonal domain P in this paper is a non-void compact simply-connected set P with polygonal boundary \(\partial P \) so that \(\text {int}(P)\) is a Lipschitz domain. The polygonal boundary \(\partial P\) is a simple closed polygon described by a finite sequence of distinct points. The set \({\mathcal {N}}(\partial P)=\{z_1,z_2,\dots ,z_J\}\) of nodes of a polygon P is enumerated with \(z_{J+1}:=z_1\) such that \(E(j):=\text {conv}\{z_j,z_{j+1}\}\) defines an edge and all J edges cover the boundary \(\partial P=E(1)\cup \dots \cup E(J)\) with an intersection \(E(j)\cap E(j+1)=\{z_{j+1}\}\) for \(j=1,\dots ,J-1\) and \(E(J)\cap E(1)={z_1}\) with \(\text {dist}(E(j),E(k)) >0\) for all distinct indices \(j\ne k\).

Let \(\mathbb {T}\) be a family of partitions of \(\overline{\Omega }\) into polygonal domains, which satisfies the conditions (M1)–(M2) with a universal positive constant \(\rho \) (Fig. 1).

  1. (M1)

    Admissibility. Any two distinct polygonal domains P and \(P'\) in \(\mathcal {T}\in \mathbb {T}\) are disjoint or share a finite number of edges or vertices.

  2. (M2)

    Mesh regularity. Every polygonal domain P of diameter \(h_P\) is star-shaped with respect to every point of a ball of radius greater than equal to \(\rho h_P\) and every edge E of P has a length |E| greater than equal to \(\rho h_P\).

Fig. 1
figure 1

Polygonal domains \(P_1\) and \(P_2\) share one edge, while \(P_1\) and \(P_4\) share three edges.

Here and throughout this paper, \(h_\mathcal {T}|_P:=h_P\) denotes the piecewise constant mesh-size and \(\mathbb {T}(\delta ):=\{\mathcal {T}\in \mathbb {T} : h_{\text {max}}\le \delta \le 1\}\) with the maximum diameter \(h_{\text {max}}\) of the polygonal domains in \(\mathcal {T}\) denotes the subclass of partitions of \(\overline{\Omega }\) into polygonal domains of maximal mesh-size \(\le \delta \). Let |P| denote the area of polygonal domain P and |E| denote the length of an edge E. With a fixed orientation to a polygonal domain P, assign the outer unit normal \(\mathbf{n} _{P}\) along the boundary \(\partial P\) and \(\mathbf{n} _E:=\mathbf{n} _P|_{E}\) for an edge E of P. Let \(\mathcal {E}\) (resp. \(\widehat{\mathcal {E}}\)) denote the set of edges E of \(\mathcal {T}\) (resp. of \(\widehat{{\mathcal {T}}}\)) and \(\mathcal {E}(P)\) denote the set of edges of polygonal domain \(P\in \mathcal {T}\). For a polygonal domain P, define

$$\begin{aligned} \text {mid}(P):=\frac{1}{|P|}\int _P x\,dx\quad \text {and}\quad \text {mid}(\partial P):=\frac{1}{|\partial P|}\int _{\partial P}x\, ds. \end{aligned}$$

Let \(\mathcal {P}_k({\mathcal {T}}):=\{v\in L^2(\Omega ):\forall P\in \mathcal {T}\quad v|_{P}\in \mathcal {P}_k(P)\}\) for \(k\in \mathbb {N}_0\) and \(\Pi _k\) denote the piecewise \(L^2\) projection onto \(\mathcal {P}_k({\mathcal {T}})\). The notation \(\Pi _k\) hides its dependence on \(\mathcal {T}\) and also assume \(\Pi _k\) applies componentwise to vectors. Given a decomposition \({\mathcal {T}}\in \mathbb {T}\) of \(\Omega \) and a function \(f\in L^2(\Omega )\), its oscillation reads

$$\begin{aligned}&\mathrm {osc}_k(f,P):= \Vert h_P(1-\Pi _k)f\Vert _{L^2(P)}\quad \text {and}\\&\mathrm {osc}_k(f,{\mathcal {T}}):=\left( \sum _{P\in {\mathcal {T}}}\Vert h_P(1-\Pi _k)f\Vert _{L^2(P)}^2\right) ^{\displaystyle 1/2} \end{aligned}$$

with \(\mathrm {osc}(f,\bullet ):=\mathrm {osc}_0(f,\bullet )\).

Remark 1

(consequence of mesh regularity assumption) There exists an interior node c in the sub-triangulation \(\widehat{{\mathcal {T}}}(P):=\{T(E)=\text {conv}(c,E): E\in \mathcal {E}(P)\}\) of a polygonal domain P with \(h_{T(E)}\le h_P\le C_{\text {sr}}h_{T(E)}\) as illustrated in Fig. 2. Each polygonal domain P can be divided into triangles so that the resulting sub-triangulation \(\widehat{{\mathcal {T}}}|_P:=\widehat{{\mathcal {T}}}(P)\) of \(\mathcal {T}\) is shape-regular. The minimum angle in the sub-triangulation solely depends on \(\rho \) [13, Sec. 2.1].

Fig. 2
figure 2

a Polygon P and b its sub-triangulation \(\widehat{{\mathcal {T}}}(P)\)

Lemma 2.1

(Poincaré–Friedrichs inequality) There exists a positive constant \(C_\mathrm {PF}\), that depends solely on \(\rho \), such that

$$\begin{aligned} \Vert f\Vert _{L^2(P)}\le C_\mathrm {PF}h_P|f|_{1,P} \end{aligned}$$
(2.1)

holds for any \(f\in H^1(P)\) with \( \sum _{j\in J}\int _{E(j)} f\,ds=0\) for a nonempty subset \(J\subseteq \{1,\dots ,m\}\) of indices in the notation \(\partial P=E(1)\cup \dots \cup E(m)\) of Fig. 2. The constant \(C_\mathrm {PF}\) depends exclusively on the number \(m:=|\mathcal {E}(P)|\) of the edges in the polygonal domain P and the quotient of the maximal area divided by the minimal area of a triangle in the triangulation \(\widehat{{\mathcal {T}}}(P)\).

Some comments on \(C_\mathrm {PF}\) for anisotropic meshes are in order before the proof gives an explicit expression for \(C_\mathrm {PF}\).

Example 2.1

Consider a rectangle P with a large aspect ratio divided into four congruent sub-triangles all with vertex \(c=\text {mid}(P)\). Then, \(m=4\) and the quotient of the maximal area divided by the minimal area of a triangle in the criss-cross triangulation \(\widehat{{\mathcal {T}}}(P)\) is one. Hence \(C_\mathrm {PF}\le 1.4231\) (from the proof below) is independent of the aspect ratio of P.

Proof of Lemma 2.1

The case \(J=\{1,\dots ,m\}\) with \(f\in H^1(P)\) and \(\int _{\partial P}f\,ds=0\) is well-known cf. e.g. [13, Sec. 2.1.5], and follows from the Bramble-Hilbert lemma [14, Lemma 4.3.8] and the trace inequality [13, Sec. 2.1.1]. The remaining part of the proof shows the inequality (2.1) for the case \(J\subseteq \{1,\dots ,m\}\). The polygonal domain P and its triangulation \(\widehat{{\mathcal {T}}}(P)\) from Fig. 2 has the center c and the nodes \(z_1,\dots ,z_m\) for the \(m:=|\mathcal {E}(P)|=|\widehat{{\mathcal {T}}}(P)|\) edges \(E(1),\dots ,E(m)\) and the triangles \(T(1),\dots ,T(m)\) with \(T(j)=T(E(j))=\text {conv}\{c,E(j)\}=\text {conv}\{c,z_j,z_{j+1}\}\) for \(j=1,\dots ,m\). Here and throughout this proof, all indices are understood modulo m, e.g., \(z_{0}=z_m\). The proof uses the trace identity

(2.2)

for \(f\in H^1(P)\) as in the lemma. This follows from an integration by parts and the observation that \((x-c)\cdot \mathbf{n} _F = 0\) on \(F\in \mathcal {E}(T(j))\backslash E(j)\) and the height \((x-c)\cdot \mathbf{n} _{E(j)}= \frac{2|T(j)|}{|E(j)}\) of the edge E(j) in the triangle T(j), for \(x\in E(j)\); cf. [24, Lemma 2.1] or [25, Lemma 2.6] for the remaining details. Another version of the trace identity (2.2) concerns \(\text {conv}\{z_j,c\}=: F(j)=\partial T(j-1)\cap \partial T(j)\) and reads

(2.3)

in \(T(j-1)\) and T(j). The three trace identities in (2.2)–(2.3) are rewritten with the following abbreviations, for \(j=1,\dots m\),

Let \( t_{\text {min}}=\min _{T\in \widehat{{\mathcal {T}}}(P)}|T|\) and \( t_{\text {max}}=\max _{T\in \widehat{{\mathcal {T}}}(P)}|T|\) abbreviate the minimal and maximal area of a triangle in \(\widehat{{\mathcal {T}}}(P)\) and let \(\widehat{\Pi }_0f\in \mathcal {P}_0(\widehat{{\mathcal {T}}}(P))\) denote the piecewise integral means of f with respect to the triangulation \(\widehat{{\mathcal {T}}}(P)\). The Poincaré inequality in a triangle with the constant \(C_\text {P}:=1/j_{1,1}\) and the first positive root \(j_{1,1} \approx 3.8317\) of the Bessel function \(J_1\) from [24, Thm. 2.1] allows for

$$\begin{aligned} \Vert f-\widehat{\Pi }_0f\Vert _{L^2(T(j))}\le C_\text {P}h_{T(j)} |f|_{1,T(j)}\quad \text {for}\;j=1,\dots ,m. \end{aligned}$$

Hence \(\Vert f-\widehat{\Pi }_0f\Vert _{L^2(P)}\le C_\text {P}h_{P} |f|_{1,P}\). This and the Pythagoras theorem (with \(f-\widehat{\Pi }_0f\perp \mathcal {P}_0(\widehat{{\mathcal {T}}}(P))\) in \(L^2(P)\)) show

$$\begin{aligned} \Vert f\Vert ^2_{L^2(P)}=\Vert \widehat{\Pi }_0f\Vert ^2_{L^2(P))}+\Vert f-\widehat{\Pi }_0f\Vert ^2_{L^2(P))}\le \Vert \widehat{\Pi }_0f\Vert ^2_{L^2(P))}+C_\text {P}^2h_P^2|f|^2_{1,P}. \end{aligned}$$
(2.4)

It remains to bound the term \(\Vert \widehat{\Pi }_0f\Vert ^2_{L^2(P))}\). The assumption on f reads \(\sum _{j\in J}\int _{E(j)}f\,ds=\sum _{j\in J}|E(j)|x_j=0\) for a subset \(J\subset \{1,\dots ,m\}\) so that \(0\in \text {conv}\{|E(1)|x_1,\dots ,|E(m)|x_m\}\). It follows \(0\in \text {conv}\{x_1,\dots ,x_m\}\) and it is known that this implies

$$\begin{aligned} \sum _{k=1}^{m}x_k^2\le {\mathcal {M}}\sum _{k=1}^{m}(x_{k}-x_{k-1})^2 \end{aligned}$$
(2.5)

for a constant \({\mathcal {M}} = \frac{1}{2(1-\cos (\pi /m))}\) that depends exclusively on m [25, Lemma 4.2]. Recall (2.2) in the form \(x_j=f_j+a_j\) to deduce from a triangle inequality and (2.5) that

$$\begin{aligned} \frac{1}{2}\sum _{j=1}^{m}f_j^2\le \sum _{k=1}^m x_k^2+\sum _{\ell =1}^{m}a_\ell ^2\le {\mathcal {M}} \sum _{k=1}^{m}(x_{k}-x_{k-1})^2+\sum _{\ell =1}^{m}a_\ell ^2. \end{aligned}$$

This shows that

$$\begin{aligned} t_{\text {max}}^{-1}\Vert \widehat{\Pi }_0f\Vert ^2_{L^2(P)} =t_{\text {max}}^{-1}\sum _{j=1}^{m}|T(j)|f_j^2\le \sum _{j=1}^{m}f_j^2\le 2{\mathcal {M}}\sum _{k=1}^{m}(x_{k}-x_{k-1})^2+2\sum _{\ell =1}^{m}a_\ell ^2. \end{aligned}$$

Recall (2.2)–(2.3) in the form \(f_{j}-f_{j-1}=b_{j-1}-c_j\) and \(x_{j}-x_{j-1}=f_{j}-f_{j-1}+a_{j}-a_{j-1}=b_{j-1}-a_{j-1}+a_{j}-c_{j}\) for all \(j=1,\dots ,m\). This and the Cauchy–Schwarz inequality imply the first two estimates in

with the definition of \(h_P\) and \(t_{\text {min}}\) in the end. The inequality \(\int _{T(j)}|x-c|^2\;dx \le \frac{1}{2}h^2_{T(j)}|T(j)|\) [25, Lemma 2.7] and the Cauchy–Schwarz inequality show, for \(j=1,\dots ,m\), that

$$\begin{aligned} |a_j|\le 2^{-3/2}h_{T(j)}|T(j)|^{-1/2}|f|_{1,|T(j)|}\le 2^{-3/2}h_Pt_{\text {min}}^{-1/2}|f|_{1,|T(j)|}. \end{aligned}$$

The combination of the previous three displayed estimates result in

$$\begin{aligned} 4h_P^{-2}(t_{\text {min}}/t_{\text {max}}) \Vert \widehat{\Pi }_0f\Vert ^2_{L^2(P)}&\le 2{\mathcal {M}}\sum _{k=1}^{m}|f|^2_{T(k-1)\cup T(k)}+\sum _{\ell =1}^{m}|f|^2_{1,T(\ell )}\\&=(4{\mathcal {M}}+1)|f|^2_{1,P}. \end{aligned}$$

This and (2.4) conclude the proof with the constant \(C_\mathrm {PF}^2 = ({\mathcal {M}}+1/4)(t_{\text {max}}/t_{\text {min}})+C_\text {P}^{2}\). \(\square \)

In the nonconforming VEM, the finite-dimensional space \(V_h\) is a subset of the piecewise Sobolev space

$$\begin{aligned} H^1(\mathcal {T}):=\{v \in L^2(\Omega ): \forall P \in \mathcal {T}\quad v|_P \in H^1(P)\}\equiv \prod _{P\in \mathcal {T}}H^1(P). \end{aligned}$$

The piecewise \(H^1\) seminorm (piecewise with respect to \(\mathcal {T}\) hidden in the notation for brevity) reads

$$\begin{aligned} |v_h|_{1,\text {pw}}:=\bigg (\sum _{P\in \mathcal {T}}| v_h|_{1,P}^2\bigg )^{1/2}\quad \text {for any}\; v_h\in H^1(\mathcal {T}). \end{aligned}$$

2.2 Local virtual element space

The first nonconforming virtual element space [3] is a subspace of harmonic functions with edgewise constant Neumann boundary values on each polygon. The extended nonconforming virtual element space [1, 17] reads

$$\begin{aligned} \widehat{V}_h(P):={\left\{ \begin{array}{ll} v_h \in H^{1}(P):&\Delta v_h\in \mathcal {P}_1(P)\quad \text {and}\quad \forall E \in \mathcal {E}(P)\quad {\frac{\partial v_h}{\partial \mathbf{n} _P}}\Big |_{E} \in \mathcal {P}_0(E) \bigg \}\end{array}\right. }. \end{aligned}$$
(2.6)

Definition 2.2

(Ritz projection) Let \(\Pi ^\nabla _1\) be the Ritz projection from \( H^1(P)\) onto the affine functions \(\mathcal {P}_1(P)\) in the \(H^1\) seminorm defined, for \(v_h\in H^1(P)\), by

$$\begin{aligned}&(\nabla \Pi ^\nabla _1 v_h -\nabla v_h, \nabla \chi )_{L^2(P)} = 0\quad \text {for all}\; \chi \in \mathcal {P}_1(P) \nonumber \\&\quad \text {and}\quad \int _{\partial P} \Pi ^\nabla _1 v_h \,ds= \int _{\partial P} v_h\,ds. \end{aligned}$$
(2.7)

Remark 2

(integral mean) For \(P\in \mathcal {T}\) and \(f\in H^1(P)\), \(\nabla \Pi ^\nabla _1f=\Pi _0\nabla f\). (This follows from (2.7.a) and the definition of the \(L^2\) projection operator \(\Pi _0\) (acting componentwise) onto the piecewise constants \(\mathcal {P}_0(P;\mathbb {R}^2)\).)

Remark 3

(representation of \(\Pi ^\nabla _1\)) For \(P\in \mathcal {T}\) and \(f\in H^1(P)\), the Ritz projection \(\Pi ^\nabla _1f\) reads

(2.8)

(The proof of (2.8) consists in the verification of (2.7): The equation (2.7.a) follows from Remark 2 with an integration by parts. The equation (2.7.b) follows from the definition of \(\text {mid}(\partial P)\) as the barycenter of \(\partial P\).)

The enhanced virtual element spaces [1, 17] are designed with a computable \(L^2\) projection \(\Pi _1\) onto \(\mathcal {P}_1(\mathcal {T})\). The resulting local discrete space under consideration throughout this paper reads

$$\begin{aligned} V_h(P):={\left\{ \begin{array}{ll} v_h \in \widehat{V}_h(P): v_h - \Pi ^\nabla _1 v_h \perp \mathcal {P}_1(P) \quad \text {in}\; L^2(P)\bigg \}\end{array}\right. }. \end{aligned}$$
(2.9)

The point in the selection of \(V_h(P)\) is that the Ritz projection \(\Pi ^\nabla _1 v_h\) coincides with the \(L^2\) projection \(\Pi _1 v_h\) for all \(v_h\in V_h(P)\). The degrees of freedom on P are given by

$$\begin{aligned} \text {dof}_E(v)=\frac{1}{|E|}\int _E v\,ds \quad \text {for all}\; E\in \mathcal {E}(P)\;\text {and}\; v\in V_h(P). \end{aligned}$$
(2.10)

Proposition 2.3

(a) The vector space \(\widehat{V}_h(P)\) from (2.6) is of dimension \(3+|\mathcal {E}( P)|\). (b) \(V_h(P)\) from (2.9) is of dimension \(|\mathcal {E}(P)|\) and the triplet \((P,V_h(P),\text {dof}_E:E\in \mathcal {E}(P))\) is a finite element in the sense of Ciarlet [28].

Proof

Let \(E(1),\dots ,E(m)\) be an enumeration of the edges \(\mathcal {E}(P)\) of the polygonal domain P in a consecutive way as depicted in Fig. 2a and define \(W(P):=\mathcal {P}_1(P)\times \mathcal {P}_0(E{(1)})\times \dots \times \mathcal {P}_0(E{(m)})\). Recall \(\widehat{V}_h(P)\) from (2.6) and identify the quotient space \( \widehat{V}_h(P)/\mathbb {R}\equiv \left\{ f\in \widehat{V}_h(P):\right. \left. \int _{\partial P}f\,ds=0\right\} \) with all functions in \(\widehat{V}_h(P)\) having zero integral over the boundary \(\partial P\) of P. Since the space \(\widehat{V}_h(P)\) consists of functions with an affine Laplacian and edgewise constant Neumann data, the map

$$\begin{aligned} S:\widehat{V}_h(P)/\mathbb {R}\rightarrow W(P),\quad \quad f\mapsto \left( -\Delta f,\frac{\partial f}{\partial \mathbf{n} _P}\Big |_{E{(1)}},\dots ,\frac{\partial f}{\partial \mathbf{n} _P}\Big |_{E{(m)}}\right) \end{aligned}$$

is well-defined and linear. The compatibility conditions for the existence of a solution of a Laplacian problem with Neumann data show that the image of S is equal to

$$\begin{aligned} \mathcal {R}(S)=\left\{ (f_1,g_1,\dots ,g_m)\in W(P):\int _P f_1dx+\sum _{j=1}^m g_j|E(j)|=0\right\} . \end{aligned}$$

(The proof of this identity assumes the compatible data \((f_1,g_1,\dots ,g_m)\) from the set on the right-hand side and solves the Neumann problem with a unique solution \(\widehat{u}\) in \(\widehat{V}_h(P)/\mathbb {R} \) and \(S\widehat{u}=(f_1,g_1,\dots ,g_m)\).) It is known that the Neumann problem has a unique solution up to an additive constant and so S is a bijection and the dimension \(m+2\) of \(\widehat{V}_h(P)/\mathbb {R}\) is that of \(\mathcal {R}(S)\). In particular, dimension of \(\widehat{V}_h(P)\) is \(m+3\). This proves (a).

Let \(\Lambda _0,\Lambda _1,\Lambda _2: H^1(P)\rightarrow \mathbb {R}\) be linear functionals

$$\begin{aligned} \Lambda _0f:=\Pi _0f,\quad \Lambda _jf:={\mathcal {M}}_j((\Pi ^\nabla _1-\Pi _1)f) \end{aligned}$$

with \({\mathcal {M}}_jf:=\Pi _0((x_j-c_j)f)\) for \(j=1,2\) and \(f\in H^1(P)\) that determines an affine function \(p_1\in \mathcal {P}_1(P)\) such that \((P,\mathcal {P}_1(P),(\Lambda _0,\Lambda _1,\Lambda _2))\) is a finite element in the sense of Ciarlet. For any edge \(E(j)\in \mathcal {E}(P)\), define as integral mean of the traces of f in \(H^1(P)\) on E(j). It is elementary to see that \(\Lambda _0,\dots ,\Lambda _{m+2}\) are linearly independent: If f in \(\widehat{V}_h(P)\) belongs to the kernel of all the linear functionals, then \(\Pi ^\nabla _1f=0\) from (2.8) with \(\Lambda _jf=0\) for each \(j=3,\dots ,2+m\). Since the functionals \(\Lambda _jf=0\) for \(j=1,2\), \((x_j-c_j)(\Pi ^\nabla _1-\Pi _1)f=0\) and \(\Pi ^\nabla _1f=0\) imply \(\Pi _1f=0\). An integration by parts leads to

$$\begin{aligned} \Vert \nabla f\Vert _{L^2(P)}^2=(-\Delta f,f)_{L^2(P)}+\Big (f,\frac{\partial f}{\partial \mathbf{n} _P}\Big )_{L^2(\partial P)}=0. \end{aligned}$$

This and show \(f\equiv 0\). Consequently, the intersection \(\cap _{j=0}^{m+2}\text {Ker}(\Lambda _j)\) of all kernels Ker\((\Lambda _0),\dots ,\text {Ker}(\Lambda )_{m+2}\) is trivial and so that the functionals \(\Lambda _0,\dots ,\Lambda _{m+2}\) are linearly independent. Since the number of the linear functionals is equal to the dimension of \(\widehat{V}_h(P)\), \((P,\widehat{V}_h(P),\{\Lambda _0,\dots ,\Lambda _{m+2}\})\) is a finite element in the sense of Ciarlet and there exists a nodal basis \(\psi _0,\dots ,\psi _{m+2}\) of \(\widehat{V}_h(P)\) with

$$\begin{aligned} \Lambda _j(\psi _k)=\delta _{jk}\quad \text {for all}\;j,k =0,\dots ,m+2. \end{aligned}$$

The linearly independent functions \(\psi _3,\dots ,\psi _{m+2}\) belong to \(V_h(P)\) and so dim\((V_h(P))\ge m\). Since \(V_h(P)\subset \widehat{V}_h(P)\) and three linearly independent conditions \((1 -\Pi ^\nabla _1) v_h \perp \mathcal {P}_1(P)\) in \(L^2(P)\) are imposed on \(\widehat{V}_h(P)\) to define \(V_h(P)\), dim\((V_h(P)) \le m\). This shows that dim\((V_h(P)) = m\) and hence, the linear functionals for \(E\in \mathcal {E}(P)\) form a dual basis of \(V_h(P)\). This concludes the proof of (b). \(\square \)

Remark 4

(stability of \(L^2\) projection) The \(L^2\) projection \(\Pi _k\) for \(k=0 ,1\) is \(H^1\) and \(L^2\) stable in \(V_h(P)\), in the sense that any \(v_h\) in \(V_h(P)\) satisfies

$$\begin{aligned} \Vert \Pi _kv_h\Vert _{L^2(P)}\le \Vert v_h\Vert _{L^2(P)}\; \text {and}\;\Vert \nabla (\Pi _kv_h)\Vert _{L^2(P)}\le \Vert \nabla v_h\Vert _{L^2(P)}. \end{aligned}$$
(2.11)

(The first inequality follows from the definition of \(\Pi _k\). The orthogonality in (2.9) and the definition of \(\Pi _1\) imply that the Ritz projection \(\Pi ^\nabla _1\) and the \(L^2\) projection \(\Pi _1\) coincide on the space \(V_h(P)\) for \(P\in \mathcal {T}\). This with the definition of the Ritz projection \(\Pi ^\nabla _1\) verifies the second inequality.)

Definition 2.4

(Fractional order Sobolev space [14]) Let \(\alpha :=(\alpha _1,\alpha _2)\) denote a multi-index with \(\alpha _j\in \mathbb {N}_0\) for \(j=1,2\) and \(|\alpha |:=\alpha _1+\alpha _2.\) For a real number m with \(0<m<1\), define

$$\begin{aligned} H^{1+m}(\omega ):=\left\{ v\in H^1(\omega ):\frac{|v^{\alpha }(x)-v^{\alpha }(y)|}{|x-y|^{(1+m)}}\in L^2(\omega \times \omega )\quad \text {for all}\;|\alpha |=1\right\} \end{aligned}$$

with \(v^\alpha \) as the partial derivative of v of order \(\alpha \). Define the seminorm \(|\cdot |_{1+m}\) and Sobolev-Slobodeckij norm \(\Vert \cdot \Vert _{1+m}\) by

$$\begin{aligned} |v|_{1+m,\omega }^2{=}\sum _{|\alpha |=1}\int _{\omega }\int _{\omega } \frac{{|v^{\alpha }(x)-v^{\alpha }(y)|}^2}{|x-y|^{2(1+m)}}\,dx\,dy \quad \text {and}\quad \Vert v\Vert _{1+m,\omega }^2{=}\Vert v\Vert ^2_{1,\omega }{+}|v|_{1{+}m,\omega }^2. \end{aligned}$$

Proposition 2.5

(approximation by polynomials [29, Thm. 6.1]) Under the assumption (M2), there exists a positive constant \(C_{\mathrm {apx}}\) (depending on \(\rho \) and on the polynomial degree k) such that, for every \(v\in H^{m}(P)\), the \(L^2\) projection \(\Pi _k\) on \(\mathcal {P}_k(P)\) for \(k\in \mathbb {N}_0\) satisfies

$$\begin{aligned} \Vert v-\Pi _kv\Vert _{L^2(P)}+h_P|v-\Pi _kv|_{1,P}\le C_{\mathrm {apx}}h_P^{m}|v|_{m,P}\quad \text {for}\;1\le m\le k+1. \end{aligned}$$
(2.12)

2.3 Global virtual element space

Define the global nonconforming virtual element space, for any \(\mathcal {T}\in \mathbb {T}\), by

$$\begin{aligned} V_h:=\left\{ v_h \in H^1(\mathcal {T}): \forall P \in \mathcal {T}\quad v_h|_P \in V_h(P)\quad \text {and}\quad \forall E\in \mathcal {E}\quad \int _{E}[v_h]_E\,ds=0\right\} . \end{aligned}$$
(2.13)

Let \([\cdot ]_E\) denote the jump across an edge \(E\in \mathcal {E}\): For two neighboring polygonal domains \(P^+\) and \(P^-\) sharing a common edge \(E\in \mathcal {E}(P^+)\cap \mathcal {E}(P^-)\), \([v_h]_E:=v_{h|P^{+}}-v_{h|P^{-}}\), where \(P^+\) denote the adjoint polygonal domain with \(\mathbf{n} _{P^+|E}=\mathbf{n} _E\) and \(P^-\) denote the polygonal domain with \(\mathbf{n} _{P^-|E}=-\mathbf{n} _E\). If \(E\subset \partial \Omega \) is a boundary edge, then \([v_h]_E:=v_h|_E\).

Example 2.2

If each polygonal domain P is a triangle, then the finite-dimensional space \(V_h\) coincides with CR-FEM space. (Since the dimension of the vector space \(V_h(P)\) is three and \(\mathcal {P}_1(P)\subset V_h(P)\), \(V_h(P)=\mathcal {P}_1(P)\) for \(P\in \mathcal {T}\).)

Lemma 2.6

There exists a universal constant \( C_\mathrm {F}\) (that depends only on \(\rho \) from (M2)) such that, for all \({\mathcal {T}}\in \mathbb {T}\), any \( v_h\in V_h\) from (2.13) satisfies

$$\begin{aligned} \Vert v_h\Vert _{L^2(\Omega )}\le C_{\mathrm {F}}|v_h|_{1,\text {pw}}. \end{aligned}$$
(2.14)

Proof

Recall from Remark 1 that \(\widehat{{\mathcal {T}}}\) is a shape regular sub-triangulation of \(\mathcal {T}\) into triangles. Since \(V_h\subset H^1(\widehat{{\mathcal {T}}})\) and the Friedrichs’ inequality holds for all functions in \( H^1(\widehat{{\mathcal {T}}})\) [14, Thm. 10.6.16], there exists a positive constant \(C_{\text {F}}\) such that the (first) inequality holds in

$$\begin{aligned} \Vert v_h\Vert _{L^2(\Omega )}\le C_{\text {F}}\left( \sum _{T\in \widehat{{\mathcal {T}}}}\Vert \nabla v_h\Vert _{L^2(T)}^2\right) ^{1/2}= C_{\text {F}}|v_h|_{1,\text {pw}}. \end{aligned}$$

The (second) equality follows for \(v_h\in H^1(P)\) with \(P\in \mathcal {T}\). \(\square \)

Lemma 2.6 implies that the seminorm \(|\cdot |_{1,\text {pw}}\) is equivalent to the norm \(\Vert \cdot \Vert _{1,\text {pw}}:=\Vert \cdot \Vert ^2_{L^2(\Omega )}+|\cdot |^2_{1,\text {pw}}\) in \(V_h\) with mesh-size independent equivalence constants.

2.4 Interpolation

Definition 2.7

(interpolation operator) Let \((\psi _E : E\in \mathcal {E})\) be the nodal basis of \(V_h\) defined by \(\text {dof}_E(\psi _E)=1\) and \(\text {dof}_{F}(\psi _E)=0\) for all other edges \(F\in \mathcal {E}\setminus \{E\}\). The global interpolation operator \(I_h:H^1_0(\Omega )\rightarrow V_h\) reads

Since a Sobolev function \(v\in V\) has traces and the jumps \([v]_E\) vanish across any edge \(E\in \mathcal {E}\), the interpolation operator \(I_h\) is well-defined. Recall \(\rho \) from (M2), \(C_{\mathrm {PF}}\) from Lemma 2.1, and \(C_{\text {apx}}\) from Proposition 2.5.

Theorem 2.8

(interpolation error)

  1. (a)

    There exists a positive constant \(C_{\mathrm {Itn}}\) (depending on \(\rho \)) such that any \(v\in H^1(P)\) and its interpolation \(I_hv\in V_h(P)\) satisfy

    $$\begin{aligned} \Vert \nabla I_hv\Vert _{L^2(P)}\le C_{\mathrm {Itn}}\Vert \nabla v\Vert _{L^2(P)}. \end{aligned}$$
  2. (b)

    Any \(P\in \mathcal {T}\in \mathbb {T}\) and \(v\in H^1(P)\) satisfy \(|v-I_hv|_{1,P}\le (1+C_{\mathrm {Itn}})\Vert (1-\Pi _0)\nabla v\Vert _{L^2(P)}\) and

    $$\begin{aligned} h_P^{-1} \Vert (1-\Pi _1I_h)v\Vert _{L^2(P)}+ |(1-\Pi _1I_h)v|_{1,P}\le (1+C_\mathrm {PF})\Vert (1-\Pi _0)\nabla v\Vert _{L^2(P)}. \end{aligned}$$
  3. (c)

    The positive constant \(C_\mathrm {I}:=C_{\mathrm {apx}}(1+C_{\mathrm {Itn}})(1+C_{\mathrm {PF}})\), any \(0<\sigma \le 1\), and any \(v\in H^{1+\sigma }(P)\) with the local interpolation \(I_hv|_P\in V_h(P)\) satisfy

    $$\begin{aligned} \Vert v-I_hv\Vert _{L^2(P)}+h_P|v-I_hv|_{1,P}\le C_\mathrm {I}h^{1+\sigma }_P|v|_{1+\sigma ,P}. \end{aligned}$$
    (2.15)

Proof of (a)

The boundedness of the interpolation operator in \(V_h(P)\) is mentioned in [17] with a soft proof in its appendix. The subsequent analysis aims at a clarification that \(C_{\text {I}}\) depends exclusively on the parameter \(\rho \) in (M2). The elementary arguments apply to more general situations in particular to 3D. Given \(I_hv\in V_h(P)\), \(q_1:=-\Delta I_hv\in \mathcal {P}_1(P)\) is affine and \(\int _E(v-I_hv)\,ds=0\). Since \(\frac{\partial I_hv}{\partial \mathbf{n} _P}\) is edgewise constant, this shows \(\int _E{\frac{\partial I_hv}{\partial \mathbf{n} _P}}|_E(v-I_hv)\,ds=0\) for all \(E\in \mathcal {E}(P)\) and so \(\big \langle \frac{\partial I_hv}{\partial \mathbf{n} _P},v-I_hv\big \rangle _{\partial P}=0\). An integration by parts leads to

$$\begin{aligned} (\nabla I_hv,\nabla (I_hv-v))_{L^2(P)}=(q_1,I_hv-v)_{L^2(P)}=(q_1,\Pi ^\nabla _1I_hv-v)_{L^2(P)} \end{aligned}$$

with \(q_1\in \mathcal {P}_1(P)\) and \(\Pi _1v_h=\Pi ^\nabla _1v_h\) for \(v_h\in V_h(P)\) in the last step. Consequently,

$$\begin{aligned} \Vert \nabla I_hv\Vert _{L^2(P)}^2&=(\nabla I_hv,\nabla (I_hv-v))_{L^2(P)}+(\nabla I_hv,\nabla v)_{L^2(P)}\nonumber \\&=(q_1,\Pi ^\nabla _1 I_hv-v)_{L^2(P)}+(\nabla I_hv,\nabla v)_{L^2(P)}\nonumber \\&\le \Vert q_1\Vert _{L^2(P)}\Vert v-\Pi ^\nabla _1I_hv\Vert _{L^2(P)}+\Vert \nabla I_hv\Vert _{L^2(P)}\Vert \nabla v\Vert _{L^2(P)} \end{aligned}$$
(2.16)

with the Cauchy inequality in the last step. Remarks 2 and 3 on the Ritz projection, and the definition of \(I_h\) show

$$\begin{aligned} \Pi _0\nabla v&= \nabla \Pi ^\nabla _1 v=|P|^{-1}\int _{\partial P}v\,\mathbf{n} _P\,ds \nonumber \\&=|P|^{-1}\int _{\partial P}I_hv \mathbf{n} _P\,ds=\Pi _0 \nabla I_hv=\nabla \Pi ^\nabla _1 I_hv. \end{aligned}$$
(2.17)

The function \(f:=v-\Pi ^\nabla _1I_hv\in H^1(P)\) satisfies \(\int _{\partial P}f\,ds=\int _{\partial P}(v-I_hv)\,ds=0\) and the Poincaré–Friedrichs inequality from Lemma 2.1.a shows

$$\begin{aligned} \Vert v-\Pi ^\nabla _1I_hv\Vert _{L^2(P)}\le C_{\text {PF}}h_P\Vert \nabla (v-\Pi ^\nabla _1I_hv)\Vert _{L^2(P)}=C_{\text {PF}}h_P\Vert (1-\Pi _0)\nabla v\Vert _{L^2(P)} \end{aligned}$$
(2.18)

with (2.17) in the last step. Let \(\phi _c\in S^1_0(\widehat{{\mathcal {T}}}(P)):=\{w\in C^0(P):w|_{T(E)}\in \mathcal {P}_1(T(E))\quad \text {for all}\;E\in \mathcal {E}(P)\}\) denote the piecewise linear nodal basis function of the interior node c with respect to the triangulation \(\widehat{{\mathcal {T}}}(P)=\{T(E): E\in \mathcal {E}(P)\}\) (cf. Fig. 2b for an illustration of \(\widehat{{\mathcal {T}}}(P)\)). An inverse estimate

$$\begin{aligned} \Vert f_1\Vert _{L^2(T(E))}\le C_1\Vert \phi _c^{1/2}f_1\Vert _{L^2(T(E))}\quad \text {for all}\;f_1\in \mathcal {P}_1(\widehat{{\mathcal {T}}}(P)) \end{aligned}$$

on the triangle \(T(E):=\text {conv}(E\cup \{c\})\) holds with the universal constant \(C_1\). A constructive proof computes the mass matrices for T with and without the weight \(\phi _c\) to verify that the universal constant \(C_1\) does not depend on the shape of the triangle T(E). This implies

$$\begin{aligned} C_1^{-1}\Vert q_1\Vert _{L^2(P)}^2\le (\phi _cq_1,q_1)_{L^2(P)}=(-\Delta I_hv,\phi _cq_1)=(\nabla I_hv,\nabla (\phi _cq_1))_{L^2(P)} \end{aligned}$$
(2.19)

with an integration by parts for \(\phi _c q_1\in H^1_0(P)\) and \(I_hv\) in the last step. The mesh-size independent constant \(C_2\) in the standard inverse estimate

$$\begin{aligned} h_{T(E)}\Vert \nabla q_2\Vert _{L^2(T(E))}\le C_2\Vert q_2\Vert _{L^2(T(E))}\quad \text {for all}\;q_2\in \mathcal {P}_2(T(E)) \end{aligned}$$

depends merely on the angles in the triangle \(T(E), E\in \mathcal {E}(P),\) and so exclusively on \(\rho \). With \(C^{-1}_{\text {sr}}h_P\le h_{T(E)}\) from Remark 1, this shows

$$\begin{aligned} C_2^{-1}C_{\text {sr}}^{-1}h_P\Vert \nabla \phi _cq_1\Vert _{L^2(P)}\le \Vert \phi _cq_1\Vert _{L^2(P)}\le \Vert q_1\Vert _{L^2(P)}. \end{aligned}$$

This and (2.19) lead to

$$\begin{aligned} \Vert q_1\Vert _{L^2(P)}\le C_1C_2C_{\text {sr}}h_P^{-1}\Vert \nabla I_hv\Vert _{L^2(P)}. \end{aligned}$$
(2.20)

The combination with (2.16)–(2.18) proves

$$\begin{aligned} \Vert \nabla I_hv\Vert _{L^2(P)}^2&\le (C_1C_2C_{\text {sr}}C_{\text {PF}}\Vert (1-\Pi _0)\nabla v\Vert _{L^2(P)}+\Vert \nabla v\Vert _{L^2(P)})\Vert \nabla I_hv\Vert _{L^2(P)}\\&\le (1+C_1C_2C_{\text {sr}}C_{\text {PF}})\Vert \nabla v\Vert _{L^2(P)}\Vert \nabla I_hv\Vert _{L^2(P)}. \end{aligned}$$

\(\square \)

Proof of (b)

The identity (2.17) reads \(\Pi _0\nabla (1-I_h)v=0\) and the triangle inequality results in

$$\begin{aligned} |v-I_hv|_{1,P}&=\Vert (1-\Pi _0)\nabla (1-I_h)v\Vert _{L^2(p)} \nonumber \\&\le \Vert (1-\Pi _0)\nabla v\Vert _{L^2(P)}+\Vert (1-\Pi _0)\nabla I_hv\Vert _{L^2(P)}. \end{aligned}$$
(2.21)

Since \(I_h\) is the identity in \(\mathcal {P}_1(P)\), it follows \( (1-\Pi _0)\nabla I_hv=(1-\Pi _0)\nabla I_h(v-\Pi ^\nabla _1v).\) This and the boundedness of the interpolation operator \(I_h\) lead to

$$\begin{aligned} \Vert (1-\Pi _0)\nabla I_hv\Vert _{L^2(P)}&\le \Vert \nabla I_h(1-\Pi ^\nabla _1)v\Vert _{L^2(P)}\nonumber \\&\le C_{\mathrm {Itn}}\Vert \nabla (1-\Pi ^\nabla _1)v\Vert _{L^2(P)}=C_{\mathrm {Itn}}\Vert (1-\Pi _0)\nabla v\Vert _{L^2(P)} \end{aligned}$$
(2.22)

with Remark 2 in the last step. The combination of (2.21) and (2.22) proves the first part of (b).

The identity \(|(1-\Pi _1I_h)v|_{1,P}=\Vert (1-\Pi _0)\nabla v\Vert _{L^2(P)}\) follows from (2.17). Since \(\Pi _1=\Pi ^\nabla _1\) in \(V_h\) and \(\int _{\partial P}v\,ds=\int _{\partial P}I_hv\,ds=\int _{\partial P}\Pi ^\nabla _1I_hv\,ds\), the Poincaré–Friedrichs inequality

$$\begin{aligned} \Vert (1-\Pi _1I_h)v\Vert _{L^2(P)}\le C_{\text {PF}}h_P|(1-\Pi _1I_h)v|_{1,P} \end{aligned}$$

follows from Lemma 2.1.a. This concludes the proof of (b). \(\square \)

Proof of (c)

This is an immediate consequence of the part (b) with (2.12) and the Poincaré–Friedrichs inequality for \(v-I_hv\) (from above) in Lemma 2.1.a. \(\square \)

3 Preliminary estimates

This subsection formulates the discrete problem along with the properties of the discrete bilinear form such as boundedness and a G\(\mathring{a}\)rding-type inequality.

3.1 The discrete problem

Denote the restriction of the bilinear forms \(a(\cdot ,\cdot ), b(\cdot ,\cdot )\) and \(c(\cdot ,\cdot )\) on a polygonal domain \(P\in \mathcal {T}\) by \(a^P(\cdot ,\cdot ), b^P(\cdot ,\cdot )\) and \(c^P(\cdot ,\cdot )\). The corresponding local discrete bilinear forms are defined for \(u_h, v_h\in V_h(P)\) by

$$\begin{aligned} a_h^P(u_h,v_h)&:= (\mathbf{A} \nabla \Pi _1 u_h, \nabla \Pi _1 v_h)_{L^2(P)}+S^P((1 - \Pi _1)u_h,(1 - \Pi _1)v_h) , \end{aligned}$$
(3.1)
$$\begin{aligned} b_h^P(u_h,v_h)&:=\ (\Pi _1 u_h,\mathbf{b} \cdot \nabla \Pi _1v_h)_{L^2(P)}, \end{aligned}$$
(3.2)
$$\begin{aligned} c_h^P(u_h,v_h)&:= (\gamma \Pi _1 u_h,\Pi _1 v_h)_{L^2(P)}, \end{aligned}$$
(3.3)
$$\begin{aligned} B_h^P(u_h,v_h)&:=a_h^P(u_h,v_h)+b_h^P(u_h,v_h)+c_h^P(u_h,v_h). \end{aligned}$$
(3.4)

Choose the stability term \(S^P(u_h,v_h)\) as a symmetric positive definite bilinear form on \(V_h(P)\times V_h(P)\) for a positive constant \(C_s\) independent of P and \(h_P\) satisfying

$$\begin{aligned} C_s^{-1}a^P(v_h,v_h) \le S^P(v_h,v_h) \le C_s a^P(v_h,v_h) \quad \text {for all}\,\, v_h\in V_h(P) \,\,\text {with}\Pi _1v_h=0. \end{aligned}$$
(3.5)

For some positive constant approximation \(\overline{\mathbf{A }}_P\) of \(\mathbf{A} \) over P and the number \(N_P:=|\mathcal {E}(P)|\) of the degrees of freedom (2.10) of \(V_h(P)\), a standard example of a stabilization term from [4, 36, Sec. 4.3] with a scaling coefficient \(\overline{\mathbf{A }}_P\) reads

$$\begin{aligned} S^P(v_h,w_h):=\overline{\mathbf{A }}_P\sum _{r=1}^{N_P} \text {dof}_r(v_h)\text {dof}_r(w_h) \quad \text {for all}\; v_h, w_h\in V_h. \end{aligned}$$
(3.6)

Note that an approximation \(\overline{\mathbf{A }}_P\) is a positive real number (not a matrix) and can be chosen as \(\sqrt{a_0a_1}\) with the positive constants \(a_0\) and \(a_1\) from (A2). For \(f\in L^2(\Omega )\) and \(v_h\in V_h\), define the right-hand side functional \(f_h\) on \(V_h\) by

$$\begin{aligned} (f_h,v_h)_{L^2(P)}&:=( f, \Pi _1v_h)_{L^2(P)}. \end{aligned}$$
(3.7)

The sum over all the polygonal domains \(P\in \mathcal {T}\) reads

$$\begin{aligned} a_h(u_h,v_h)&:=\sum _{P\in \mathcal {T}} a_h^P(u_h,v_h), b_h(u_h,v_h):=\sum _{P\in \mathcal {T}}b_h^P(u_h,v_h),\\ c_h(u_h,v_h)&:=\sum _{P\in \mathcal {T}} c_h^P(u_h,v_h), s_h(u_h,v_h):=\sum _{P\in \mathcal {T}}S^P((1-\Pi _1)u_h,(1-\Pi _1)v_h),\\ B_h(u_h,v_h)&:=\sum _{P\in \mathcal {T}}B_h^P(u_h,v_h), (f_h,v_h)_{L^2(\Omega )}:=\sum _{P\in \mathcal {T}}(f_h,v_h)_{L^2(P)} \quad \text {for all}\; u_h, v_h\in V_h. \end{aligned}$$

The discrete problem seeks \(u_h\in V_h\) such that

$$\begin{aligned} B_h(u_h,v_h)=(f_h,v_h)_{L^2(\Omega )}\quad \text {for all}\; v_h\in V_h. \end{aligned}$$
(3.8)

Remark 5

(polygonal mesh with small edges) The conditions (M1)–(M2) are well established and apply throughout the paper. The sub-triangulation \(\widehat{{\mathcal {T}}}\) may not be shape-regular without the edge condition \(|E|\ge \rho h_P\) for an edge \(E\in \mathcal {T}(P)\) and \(P\in \mathcal {T}\), but satisfies the maximal angle condition and the arguments employed in the proof of [8, Lemma 6.3] can be applied to show (2.20) in Theorem 2.8.a. For more general star-shaped polygon domains with short edges, the recent anisotropic analysis [8, 15, 18] indicates that the stabilization term has to be modified as well to avoid a logarithmic factor in the optimal error estimates.

3.2 Properties of the discrete bilinear form

The following proposition provides two main properties of the discrete bilinear form \(B_h\).

Proposition 3.1

There exist positive universal constants \(M, \alpha \) and a universal nonnegative constant \(\beta \) depending on the coefficients \(\mathbf{A} ,\mathbf{b} ,\gamma \) such that

  1. (a)

    Boundedness: \(|B_h(u_h,v_h)| \le M|u_h|_{1,\text {pw}}|v_h|_{1,\text {pw}} \quad \text {for all}\,\, u_h,v_h \in V_h.\)

  2. (b)

    G\(\mathring{a}\)rding-type inequality: \(\alpha |v_h|^2_{1,\text {pw}}-\beta \Vert v_h\Vert ^2_{L^2(\Omega )}\le B_h(v_h,v_h) \quad \text {for all}\,\, v_h\in V_h.\)

Proof of (a)

The upper bound of the coefficients from the assumption (A1), the Cauchy–Schwarz inequality, the stability (2.11) of \(\Pi _1\), and the definition (3.5) of the stabilization term imply the boundedness of \(B_h\) with \(M:=(1+C_s)\Vert \mathbf{A} \Vert _{\infty }+C_\mathrm {F}\Vert \mathbf{b} \Vert _{\infty }+C_\mathrm {F}^2\Vert \gamma \Vert _{\infty }\). The details of the proof follow as in [6, Lemma 5.2] with the constant \(C_\mathrm {F}\) from Lemma 2.6. \(\square \)

Proof of (b)

The first step shows that \(a_h(\cdot ,\cdot )\) is coercive. For \(v_h \in V_h(P)\), \(\Pi _1v_h=\Pi ^\nabla _1v_h\) and \(\nabla \Pi _1v_h\perp \nabla (v_h-\Pi ^\nabla _1v_h)\) in \(L^2(P;\mathbb {R}^2)\). This orthogonality, the assumption (A2), and the definition of the stability term (3.5) with the constant \(C_s^{-1}\le 1\) imply for \(\alpha _0=a_0C_s^{-1}\) that

$$\begin{aligned}&\alpha _0|v_h|_{1,\text {pw}}^2\le a_0\Vert \nabla _\text {pw}\Pi _1 v_h\Vert ^2_{L^2(\Omega )}+a_0C_s^{-1}\Vert \nabla _\text {pw}(1-\Pi _1) v_h\Vert ^2_{L^2(\Omega )}\nonumber \\&\quad \le \left( \mathbf{A} \nabla _\text {pw}\Pi _1v_h,\nabla _\text {pw}\Pi _1v_h)_{L^2(\Omega )}{+}C_s^{-1} (\mathbf{A} \nabla _\text {pw}(1-\Pi _1) v_h,\nabla _\text {pw}(1{-}\Pi _1)v_h\right) _{L^2(\Omega )}\nonumber \\&\quad \le (\mathbf{A} \nabla _\text {pw}\Pi _1v_h,\nabla _\text {pw}\Pi _1v_h)_{L^2(\Omega )} {+}s_h((1-\Pi _1)v_h,(1-\Pi _1)v_h)\,{=}\,a_h(v_h,v_h). \end{aligned}$$
(3.9)

The Cauchy–Schwarz inequality, (2.11), and the Young inequality lead to

$$\begin{aligned}&|b_h(v_h,v_h)+c_h(v_h,v_h)|\nonumber \\&\quad \le \Vert \mathbf{b} \Vert _\infty \Vert \Pi _1 v_h\Vert _{L^2(\Omega )}\Vert \nabla _\text {pw}\Pi _1 v_h\Vert _{L^2(\Omega )}+\Vert \gamma \Vert _\infty \Vert \Pi _1 v_h\Vert _{L^2(\Omega )}^2\nonumber \\&\quad leq \Vert \mathbf{b} \Vert _\infty \Vert v_h\Vert _{L^2(\Omega )}|v_h|_{1,\text {pw}}+\Vert \gamma \Vert _\infty \Vert v_h\Vert _{L^2(\Omega )}^2\nonumber \\&\quad \le \frac{\Vert \mathbf{b} \Vert ^2_\infty }{2\alpha _0}\Vert v_h\Vert _{L^2(\Omega )}^2+\frac{\alpha _0}{2}|v_h|^2_{1,\text {pw}}+\Vert \gamma \Vert _\infty \Vert v_h\Vert _{L^2(\Omega )}^2. \end{aligned}$$
(3.10)

The combination of (3.9)–(3.10) proves

$$\begin{aligned} \frac{\alpha _0}{2}|v_h|^2_{1,\text {pw}}-\left( \frac{\Vert \mathbf{b} \Vert ^2_\infty }{2\alpha _0}+\Vert \gamma \Vert _\infty \right) \Vert v_h\Vert ^2_{L^2(\Omega )}\le B_h(v_h,v_h). \end{aligned}$$

This concludes the proof of (b) with \(\alpha =\frac{\alpha _0}{2}\) and \(\beta =\frac{\Vert \mathbf{b} \Vert ^2_\infty }{2\alpha _0}+\Vert \gamma \Vert _{\infty }\). \(\square \)

Remark 6

(\(\Vert \cdot \Vert _h\approx |\cdot |_{1,\text {pw}}\)) The discrete space \(V_h\) of the nonconforming VEM is endowed with the natural norm \(\Vert \cdot \Vert _h := a_h(\cdot ,\cdot )^{1/2}\) induced by the scalar product \(a_h\). The boundedness of \(a_h\) is proven in (a), while (3.9) shows the converse estimate in the equivalence \(\Vert \cdot \Vert _h\approx |\cdot |_{1,\text {pw}}\) in \(V_h\), namely

$$\begin{aligned} \alpha _0|v_h|^2_{1,\text {pw}}\le a_h(v_h,v_h)\le \Vert \mathbf{A} \Vert _\infty (1+C_s)|v_h|_{1,\text {pw}}^2\quad \text {for all}\; v_h\in V_h. \end{aligned}$$

3.3 Consistency error

This subsection discusses the consistency error between the continuous bilinear form B and the corresponding discrete bilinear form \(B_h\). Recall the definition \(B^P(\cdot ,\cdot )\equiv a^P(\cdot ,\cdot )+b^P(\cdot ,\cdot )+c^P(\cdot ,\cdot )\) and \(B_h^P(\cdot ,\cdot )\equiv a_h^P(\cdot ,\cdot )+b_h^P(\cdot ,\cdot )+c_h^P(\cdot ,\cdot )\) for a polygonal domain \(P\in \mathcal {T}\) from Sect. 2.1.

Lemma 3.2

(consistency)

  1. (a)

    There exists a positive constant \(C_{\text {cst}}\) (depending only on \(\rho \)) such that any \(v\in H^1(\Omega )\) and \(w_h\in V_h\) satisfy

    $$\begin{aligned} B^P(\Pi _1 v,w_h)-B_h^P(\Pi _1 v,w_h)\le C_{\mathrm {cst}}\,h_P\Vert v\Vert _{1,P}|w_h|_{1,P}\quad \text {for all}\; P\in \mathcal {T}. \end{aligned}$$
    (3.11)
  2. (b)

    Any \(f\in L^2(\Omega )\) and \(f_h:=\Pi _1f\) satisfy

    $$\begin{aligned} \Vert f-f_h\Vert _{V_h^*}:=\sup _{0\ne v_h\in V_h}\frac{(f-f_h,v_h)}{\Vert v_h\Vert _{1,\text {pw}}}\le C_\mathrm {PF}\, \mathrm {osc}_1(f,\mathcal {T}). \end{aligned}$$
    (3.12)

Proof

Observe that \(S^P((1-\Pi _1)\Pi _1 v,(1-\Pi _1)w_h)=0\) follows from \((1-\Pi _1)\Pi _1 v=0\). The definition of \(B^P\) and \(B_h^P\) show

$$\begin{aligned} B^P(\Pi _1 v,w_h)-B_h^P(\Pi _1 v,w_h)=: T_1+T_2+T_3. \end{aligned}$$
(3.13)

The term \(T_1\) in (3.13) is defined as the difference of the contributions from \(a^P\) and \(a^P_h\). Their definitions prove the equality (at the end of the first line below) and the definition of \(\Pi _1\) prove the next equality in

$$\begin{aligned} T_1&:=a^P(\Pi _1 v,w_h)-a_h^p(\Pi _1 v,w_h) =(\mathbf{A} \nabla \Pi _1 v,\nabla (1-\Pi _1) w_h)_{L^2(P)}\\&=((\mathbf{A} -\Pi _0\mathbf{A} )(\nabla \Pi _1 v),\nabla (1-\Pi _1) w_h)_{L^2(P)}\le h_P |\mathbf{A} |_{1,\infty }|v|_{1,P}|w_h|_{1,P}. \end{aligned}$$

The last inequality follows from the Cauchy–Schwarz inequality, the Lipschitz continuity of \(\mathbf{A} \), and the stabilities \(\Vert \nabla \Pi _1v_h\Vert _{L^2(P)}\le \Vert \nabla v_h\Vert _{L^2(P)}\) and \(\Vert \nabla (1-\Pi _1)w_h\Vert _{L^2(P)}\le \Vert \nabla w_h\Vert _{L^2(P)}\) from Remark 4. Similar arguments apply to \(T_2\) from the differences of \(b^P\) and \(b^P_h\), and \(T_3\) from those of \(c^P\) and \(c_h^P\) in (3.13). This leads to

$$\begin{aligned} T_2&:=b^P(\Pi _1v,w_h)-b_h^P(\Pi _1 v,w_h)\\&=((\mathbf{b} -\Pi _0\mathbf{b} )\Pi _1v,\nabla (1-\Pi _1)w_h)_{L^2(P)}\\&\quad +((\Pi _0\mathbf{b} )(1-\Pi _0)(\Pi _1v),\nabla (1-\Pi _1)w_h)_{L^2(P)}\\&\le (|\mathbf{b} |_{1,\infty }+C_{\mathrm {apx}}\Vert \mathbf{b} \Vert _{\infty })h_P\Vert v\Vert _{1,P}|w_h|_{1,P},\\ T_3&:=c^P(\Pi _1v,w_h)-c_h^P(\Pi _1v,w_h)=(\gamma \Pi _1v,(1-\Pi _1)w_h)_{L^2(P)}\\&\le C_{\mathrm {PF}}\, \Vert \gamma \Vert _{\infty }h_P\Vert v\Vert _{L^2(P)}|w_h|_{1,P}. \end{aligned}$$

The inequality for the last step in \(T_2\) follows from the Cauchy–Schwarz inequality, the Lipschitz continuity of \(\mathbf{b} \), the estimate \(\Vert (1-\Pi _0)\Pi _1v\Vert _{L^2(P)}\le \Vert (1-\Pi _0)v\Vert _{L^2(P)}\le C_{\text {apx}}h_P|v|_{1,P}\) from (2.12), and the above stabilities \(\Vert \nabla \Pi _1v_h\Vert _{L^2(P)}\le \Vert \nabla v_h\Vert _{L^2(P)}\) and \(\Vert \nabla (1-\Pi _1)w_h\Vert _{L^2(P)}\le \Vert \nabla w_h\Vert _{L^2(P)}\). The inequality for the last step in \(T_3\) follows from the Cauchy–Schwarz inequality, \(\Vert \Pi _1v\Vert _{L^2(P)}\) \(\le \Vert v\Vert _{L^2(P)}\) from (2.11) and the Poincaré–Friedrichs inequality in Lemma 2.1.a for \(w_h-\Pi _1w_h\) with \(\int _{\partial P}(w_h-\Pi _1w_h)\,ds=0\) from \(\Pi _1=\Pi ^\nabla _1\) in \(V_h\). The combination of the above estimates shows (3.11). The proof of (3.12) adapts the arguments in the above analysis of \(T_3\) and the definition of \(\mathrm {osc}_1(f,\mathcal {T})\) in Sect. 2.1 for the proof of

$$\begin{aligned} (f-f_h,w_h)_{L^2(P)} =(f-\Pi _1 f,w_h-\Pi _1 w_h)_{L^2(P)}\le C_{\text {PF}}|w_h|_{1,P}\, \mathrm {osc}_1(f,P). \end{aligned}$$

This concludes the proof. \(\square \)

3.4 Nonconformity error

Enrichment operators play a vital role in the analysis of nonconforming finite element methods [12]. For any \(v_h\in V_h,\) the objective is to find a corresponding function \(Jv_h\in H_0^1(\Omega )\). The idea is to map the VEM nonconforming space into the Crouzeix-Raviart finite element space

$$\begin{aligned} \text {CR}_0^1(\widehat{{\mathcal {T}}}):= \{ v\in {\mathcal {P}}_1(\widehat{{\mathcal {T}}}):&\forall \; E\in \widehat{{\mathcal {E}}}\quad v \text {is continuous at mid}(E)\quad \text {and}\\ &\forall \; E\in {\mathcal {E}}(\partial \Omega )\quad v(\text {mid}(E))=0\} \end{aligned}$$

with respect to the shape-regular triangulation \(\widehat{{\mathcal {T}}}\) from Remark 1. Let \(\psi _E\) be the edge-oriented basis functions of CR\(_0^1(\widehat{{\mathcal {T}}})\) with \(\psi _E(\text {mid} E)=1\) and \(\psi _E(\text {mid} F)=0\) for all other edges \(F\in \widehat{{\mathcal {E}}}\setminus \{E\}.\) Define the interpolation operator \(I_{\text {CR}} : V_h\rightarrow \text {CR}_0^1(\widehat{{\mathcal {T}}})\), for \(v_h\in V_h\), by

(3.14)

The definition of \(V_h\) implies \(\int _F[v_h]\,ds=0\) for \(v_h\in V_h\) and for all \(F\in \mathcal {E}\). Since \(v_h|_{P}\in H^1(P)\), it follows \(\int _F[v_h]\,ds=0\) for all \(F\in \widehat{{\mathcal {E}}}\setminus \mathcal {E}\). This shows \(\int _{F} v_{h|T^{\pm }}\,ds\) is unique for all edges \(F=\partial T^+\cap \partial T^-\in \widehat{{\mathcal {E}}}\) and, consequently, \(I_{\text {CR}}v_h\) is well-defined (independent of the choice of traces selected in the evaluation of ). The approximation property of \(I_{\text {CR}}\) on each \(T\in \widehat{{\mathcal {T}}}\) reads

$$\begin{aligned} h_T^{-1}\Vert v_h-I_{\text {CR}}v_h\Vert _{L^2(T)}+|v_h-I_{\text {CR}}v_h|_{1,T}\le 2|v_h|_{1,T} \end{aligned}$$
(3.15)

(cf. [23, Thm 2.1] or [21, Thm 4] for explicit constants). Define an enrichment operator \(E_h: \text {CR}_0^1(\widehat{{\mathcal {T}}})\rightarrow H_0^1(\Omega )\) by averaging the function values at each interior vertex z, that is,

$$\begin{aligned} E_hv_{\text {CR}}(z)=\frac{1}{|\widehat{{\mathcal {T}}}(z)|}\sum _{T\in \widehat{{\mathcal {T}}}(z)}{v_{\text {CR}}}|_{T}(z) \end{aligned}$$
(3.16)

and zero on boundary vertices. In (3.16) the set \(\widehat{{\mathcal {T}}}(z):=\{T\in \widehat{{\mathcal {T}}}| z\in T\}\) of neighboring triangles has the cardinality \(|\widehat{{\mathcal {T}}}(z)|\ge 3\).

The following lemma describes the construction of a modified companion operator \(J:V_h\rightarrow H_0^1(\Omega )\), which is a right-inverse of the interpolation operator \(I_h\) from Definition 2.7.

Lemma 3.3

(conforming companion operator) There exists a linear map \(J:V_h\rightarrow H^1_0(\Omega )\) and a universal constant \(C_\mathrm {J}\lesssim 1\) such that any \(v_h\in V_h\) satisfies \(I_hJv_h=v_h\) and

  1. (a)

    for any edge \(E\in \widehat{{\mathcal {E}}},\)

  2. (b)

    \(\displaystyle \nabla _\text {pw}(v_h-Jv_h)\perp \mathcal {P}_0(\mathcal {T};\mathbb {R}^2)\) in \(L^2(\Omega ;\mathbb {R}^2),\)

  3. (c)

    \(\displaystyle v_h-Jv_h\perp \mathcal {P}_1(\mathcal {T})\) in \(L^2(\Omega ),\)

  4. (d)

    \(\Vert h_\mathcal {T}^{-1}(v_h-Jv_h)\Vert _{L^2(\Omega )}+|v_h-Jv_h|_{1,\text {pw}}\le C_\mathrm {J}|v_h|_{1,\text {pw}}.\)

Design of J in Lemma 3.3

Given \(v_h\in V_h\), let \(v_{\text {CR}}:=I_{\text {CR}}v_h\in \text {CR}^1_0(\widehat{{\mathcal {T}}})\). There exists an operator \(J':\text {CR}_0^1(\widehat{{\mathcal {T}}})\rightarrow H_0^1(\Omega )\) from [22, Prop. 2.3] such that any \(v_\text {CR}\in \text {CR}_0^1(\widehat{{\mathcal {T}}})\) satisfies

  1. (a’)

    for any edge \(E\in \widehat{{\mathcal {E}}},\)

  2. (b’)

    \(\displaystyle \int _{P}\nabla _{\text {pw}}(v_{\text {CR}}-J'v_{\text {CR}})\,dx=0\) for all \(P\in \mathcal {T}\),

  3. (c’)

    \(\displaystyle \Vert h_{\widehat{{\mathcal {T}}}}^{-1}(v_{\text {CR}}-J'v_{\text {CR}})\Vert _{L^2(\Omega )}+|v_{\text {CR}}-J'v_{\text {CR}}|_{1,\text {pw}}\le C_\mathrm {J'}\min _{v\in H^1_0(\Omega )}|v_{\text {CR}}-v|_{1,\text {pw}}\)

with a universal constant \(C_\mathrm {J'}\) from [25]. Set \(v:=J'I_{\text {CR}}v_h\in V:=H^1_0(\Omega )\). Recall that \(\widehat{{\mathcal {T}}}(P)\) is a shape-regular triangulation of P into a finite number of triangles. For each \(T\in \widehat{{\mathcal {T}}}(P)\), let \(b_T\in W_0^{1,\infty }(T)\) denote the cubic bubble-function \(27\lambda _1\lambda _2\lambda _3\) for the barycentric co-ordinates \(\lambda _1, \lambda _2, \lambda _3\in \mathcal {P}_1(T)\) of T with and \(\Vert \nabla b_T\Vert _{L^2(T)}\lesssim h_T^{-1}|T|^{1/2}\approx 1.\) Let \(b_T\) be extended by zero outside T and, for \(P\in \mathcal {T}\), define

$$\begin{aligned} b_P:=\frac{20}{9}\sum _{T\in \widehat{{\mathcal {T}}}(P)}b_T\in W_0^{1,\infty }(P)\subset W_0^{1,\infty }(\Omega ) \end{aligned}$$
(3.17)

with and \(\Vert \nabla b_P\Vert _{L^2(P)}\lesssim h_P^{-1}|P|^{1/2}\approx 1\). Let \(v_P\in \mathcal {P}_1(\mathcal {T})\) be the Riesz representation of the linear functional \(\mathcal {P}_1(\mathcal {T})\rightarrow \mathbb {R}\) defined by \(w_1\mapsto (v_h-v,w_1)_{L^2(\Omega )}\) for \(w_1\in \mathcal {P}_1(\mathcal {T})\) in the Hilbert space \(\mathcal {P}_1(\mathcal {T})\) endowed with the weighted \(L^2\) scalar product \((b_P\bullet ,\bullet )_{L^2(P)}\). Hence \(v_P\) exists uniquely and satisfies \(\Pi _1(v_h-v) = \Pi _1(b_Pv_P)\). Given the bubble-functions \((b_P:P\in \mathcal {T})\) from (3.17) and the above functions \((v_P:P\in \mathcal {T})\) for \(v_h\in V_h\), define

$$\begin{aligned} Jv_h:=v+\sum _{P\in \mathcal {T}}v_Pb_P\in V. \end{aligned}$$
(3.18)

\(\square \)

Proof of (a)

Since \(b_P\) vanishes at any \(x\in E\in \mathcal {E}\), it follows for any \(E\in \widehat{\mathcal {E}}\) that

where the definition of \(v=J'v_{\text {CR}}\), (a), and \(v_{\text {CR}}=I_{\text {CR}}v_h\) lead to the second, third, and fourth equality. This proves (a). \(\square \)

Proof of (b)

An integration by parts and (b) show, for all \(v_h\in V_h\) with \(Jv_h\) from (3.18), that

$$\begin{aligned} \int _P\nabla Jv_h\,dx&=\int _{\partial P}Jv_h\mathbf{n} _P\,ds=\sum _{E\in \mathcal {E}(P)}\Big (\int _EJv_h\mathbf{n} _E\,ds\Big )\\&=\sum _{E\in \mathcal {E}(P)}\Big (\int _Ev_h\mathbf{n} _E\,ds\Big )=\int _P\nabla v_h\,dx. \end{aligned}$$

Since this holds for all \(P\in \mathcal {T}\), it proves (b). \(\square \)

Proof of (c)

This is \(\Pi _1v_h=\Pi _1Jv_h\) and guaranteed by the design of J in (3.18). \(\square \)

Proof of (d)

This relies on the definition of J in (3.18) and \(J'\) with (c\('\)). Since (a) allows for \(\int _{\partial P}(v_h-Jv_h)\,ds=0\), the Poincaré–Friedrichs inequality from Lemma 2.1.a implies

$$\begin{aligned} h_P^{-1}\Vert v_h-Jv_h\Vert _{L^2(P)} \le C_{\text {PF}}|v_h-Jv_h|_{1,P}. \end{aligned}$$

Hence it remains to prove \( |v_h-Jv_h|_{1,\text {pw}}\lesssim |v_h|_{1,\text {pw}}.\) Triangle inequalities with \(v_h, Jv_h, v=J'v_{\text {CR}}\) and \(v_{\text {CR}} =I_{\text {CR}} v_h\) show the first and second inequality in

$$\begin{aligned} |v_h-Jv_h|_{1,\text {pw}}-|v-Jv_h|_{1,\text {pw}}&\le |v-v_h|_{1,\text {pw}}\nonumber \\&\le |v_h-I_{\text {CR}}v_h|_{1,\text {pw}}+|v_{\text {CR}}-J'v_{\text {CR}}|_{1,\text {pw}}\nonumber \\&\le (1+C_\mathrm {J'})|v_h|_{1,\text {pw}} \end{aligned}$$
(3.19)

with (b\('\)) for \(|v_{\text {CR}}|_{1,\text {pw}}=\Vert \Pi _0\nabla _{\text {pw}}v_h\Vert _{L^2(\Omega )}\le \Vert \nabla _{\text {pw}}v_h\Vert _{L^2(\Omega )}=|v_h|_{1,\text {pw}}\) in the last step. The equivalence of norms in the finite-dimensional space \(\mathcal {P}_1(P)\) assures the existence of a positive constant \(C_b\), independent of \(h_P\), such that any \(\chi \in \mathcal {P}_1(P)\) satisfies the inverse inequalities

$$\begin{aligned} C_b^{-1}\Vert \chi \Vert ^2_{L^2(P)}&\le (b_P,\chi ^2)_{L^2(P)}\le C_b\Vert \chi \Vert ^2_{L^2(P)}, \end{aligned}$$
(3.20)
$$\begin{aligned} C_b^{-1}\Vert \chi \Vert _{L^2(P)}&\le \Vert b_P\chi \Vert _{L^2(P)}+h_P\Vert \nabla (b_P\chi )\Vert _{L^2(P)}\le C_b\Vert \chi \Vert _{L^2(P)}. \end{aligned}$$
(3.21)

These estimates are completely standard on shape-regular triangles [2, p. 27] or [37]; so they hold on each \(T\in \widehat{{\mathcal {T}}}\) and, by definition of \(b_P\), their sum is (3.20)–(3.21). The analysis of the term \(|v-Jv_h|_{1,\text {pw}}\) starts with one \(P\in \mathcal {T}\) and (3.18) for

$$\begin{aligned} |v-Jv_h|_{1,P}=|v_Pb_P|_{1,P}\le C_bh_P^{-1}\Vert v_P\Vert _{L^2(P)} \end{aligned}$$
(3.22)

with (3.21) in the last step. The estimate (3.20) leads to the first inequality in

$$\begin{aligned} C_b^{-1}\Vert v_P\Vert ^2_{L^2(P)}&\le (b_Pv_P,v_P)_{L^2(P)}=(v_h-v,v_P)_{L^2(P)} \\&\le \Vert v_h-v\Vert _{L^2(P)}\Vert v_P\Vert _{L^2(P)}. \end{aligned}$$

The equality results from \(\Pi _1(v_h-v)=\Pi _1(v_Pb_P)\) and \(v_P\in \mathcal {P}_1(\mathcal {T})\), while the last step is the Cauchy–Schwarz inequality. Consequently, \(\Vert v_P\Vert _{L^2(P)}\le C_b\Vert v_h-v\Vert _{L^2(P)}\). This and (3.22) show

$$\begin{aligned} |v-Jv_h|_{1,\text {pw}}\le C_b^2 \Vert h^{-1}_{\mathcal {T}}(v-v_h)\Vert _{L^2(\Omega )}\le C_b^2C_\mathrm {PF}|v-v_h|_{1,\text {pw}} \end{aligned}$$

with \(\int _{\partial P}(v-v_h)\,ds=0\) from (a) and hence the Poincaré–Friedrichs inequality for \(v-v_h\) from Lemma 2.1.a in the last step. Recall \(|v-v_h|_{1,\text {pw}}\lesssim |v_h|_{1,\text {pw}}\) from (3.19) to conclude \(|v-Jv_h|_{1,\text {pw}}\lesssim |v_h|_{1,\text {pw}}\) from the previous displayed inequality. This concludes the proof of (d). \(\square \)

Proof

(Proof of \(I_hJ = \text {id}\;\text { in}\; V_h\)) Definition 2.7 and Lemma 3.3.a show, for all \(v_h\in V_h\), that

This concludes the proof of Lemma 3.3. \(\square \)

Since \(V_h\) is not a subset of \(H^1_0(\Omega )\) in general, the substitution of discrete function \(v_h\) in the weak formulation leads to a nonconformity error.

Lemma 3.4

(nonconformity error) There exist positive universal constants \(C_{\mathrm {NC}}, C^*_{\mathrm {NC}}\) (depending on the coefficients \(\mathbf{A} ,\mathbf{b} \) and the universal constants \(\rho , \sigma \)) such that all \(f,g\in L^2(\Omega )\) and all \(\mathcal {T}\in \mathbb {T}(\delta )\) (with the assumption \(h_\text {max}\le \delta \le 1\)) satisfy (a) and (b).

(a) The solution \(u\in H^{1+\sigma }(\Omega )\cap H^1_0(\Omega )\) to (1.1) satisfies

$$\begin{aligned} \sup _{0\ne v_h\in V_h}\frac{| B_\text {pw}(u,v_h)-(f,v_h)_{L^2(\Omega )}|}{\Vert v_h\Vert _{1,\text {pw}}}\le C_{\mathrm {NC}}h_{\text {max}}^{\sigma }\Vert f\Vert _{L^2(\Omega )}. \end{aligned}$$
(3.23)

(b) The solution \(\Phi \in H^{1+\sigma }(\Omega )\cap H^1_0(\Omega )\) to the dual problem (1.4) satisfies

$$\begin{aligned} \sup _{0\ne v_h\in V_h}\frac{| B_\text {pw}(v_h,\Phi )-(g,v_h)_{L^2(\Omega )}|}{\Vert v_h\Vert _{1,\text {pw}}}\le C^*_{\mathrm {NC}}h_{\text {max}}^{\sigma }\Vert g\Vert _{L^2(\Omega )}. \end{aligned}$$
(3.24)

Proof of (a)

Given \(v_h\in V_h\), define \(Jv_h\in V\) and the piecewise averages \(\overline{\mathbf{A }}:=\Pi _0(\mathbf{A} ), \overline{\mathbf{b }}:=\Pi _0(\mathbf{b} )\), and \(\overline{\gamma }:=\Pi _0(\gamma )\) of the coefficients \(\mathbf{A} , \mathbf{b} \), and \(\gamma \). The choice of test function \(v:=Jv_h\in V\) in the weak formulation (1.8) having extra properties provides the terms with oscillations in the further analysis. Abbreviate \(\varvec{\sigma }:=\mathbf{A} \nabla u+\mathbf{b} u\). The weak formulation (1.8), Lemma 3.3.b–c, and the Cauchy–Schwarz inequality reveal that

$$\begin{aligned}&B_\text {pw}(u,v_h)-(f,v_h)_{L^2(\Omega )}=B_\text {pw}(u,v_h-Jv_h)-(f,v_h-Jv_h)_{L^2(\Omega )}\nonumber \\&\qquad \le \Vert \varvec{\sigma }-\Pi _0\varvec{\sigma }\Vert _{L^2(\Omega )}\Vert \nabla _\text {pw}(1-J)v_h\Vert _{L^2(\Omega )}\nonumber \\&\qquad +\Vert h_\mathcal {T}(1-\Pi _1)(f-\gamma u)\Vert _{L^2(\Omega )}\Vert h_{\mathcal {T}}^{-1}(1-J)v_h\Vert _{L^2(\Omega )}. \end{aligned}$$
(3.25)

The first term on the right-hand side of (3.25) involves the factor

$$\begin{aligned} \Vert \varvec{\sigma }-\Pi _0\varvec{\sigma }\Vert _{L^2(\Omega )}&\le \Vert \mathbf{A} \nabla u- \Pi _0(\mathbf{A} \nabla u)\Vert _{L^2(\Omega )}+\Vert \mathbf{b} u-\Pi _0(\mathbf{b} u)\Vert _{L^2(\Omega )}\\ {}&\le \Vert (\mathbf{A} -\overline{\mathbf{A }})\nabla u+\overline{\mathbf{A }}(1-\Pi _0)\nabla u\Vert _{L^2(\Omega )}\\&\quad +\Vert (\mathbf{b} -\overline{\mathbf{b }})u+\overline{\mathbf{b }}(1-\Pi _0)u\Vert _{L^2(\Omega )}\\&\le \Big ( h_{\text {max}}(|\mathbf{A} |_{1,\infty }+|\mathbf{b} |_{1,\infty })+C_{\text {apx}}(h_{\text {max}}^\sigma \Vert \mathbf{A} \Vert _\infty +h_{\text {max}}\Vert \mathbf{b} \Vert _\infty )\Big )\\&\quad \Vert u\Vert _{1+\sigma ,\Omega }. \end{aligned}$$

The last inequality follows from the Lipschitz continuity of the coefficients \(\mathbf{A} \) and \(\mathbf{b} \), and the estimate (2.12). Lemma 3.3.d leads to the estimates \(\Vert \nabla _\text {pw}(1-J)v_h\Vert _{L^2(\Omega )}\le C_J|v_h|_{1,\text {pw}}\) and

$$\begin{aligned} \Vert h_\mathcal {T}(1-\Pi _1)(f-\gamma u)\Vert _{L^2(\Omega )}\Vert h_{\mathcal {T}}^{-1}(1-J)v_h\Vert _{L^2(\Omega )}\le \mathrm {osc}_1(f-\gamma u,\mathcal {T})C_J|v_h|_{1,\text {pw}}. \end{aligned}$$

The substitution of the previous estimates in (3.25) with \(h_{\mathrm {max}}\le 1\) (from \(\delta \le 1\) by assumption) and the regularity (1.5) show

$$\begin{aligned} B_\text {pw}(u,v_h)-(f,v_h)\le C_{\mathrm {NC}}h_{\text {max}}^{\sigma }\Vert f\Vert _{L^2(\Omega )}\Vert v_h\Vert _{1,\text {pw}} \end{aligned}$$

with \(C_{\mathrm {NC}}:=C_J\Big (( |\mathbf{A} |_{1,\infty }+|\mathbf{b} |_{1,\infty }+C_{\text {apx}}(\Vert \mathbf{A} \Vert _\infty +\Vert \mathbf{b} \Vert _\infty )+\Vert \gamma \Vert _\infty )C_{\text {reg}}+1\Big )\). This concludes the proof of Lemma 3.4.a. \(\square \)

Proof of (b)

The solution \(\Phi \in V\) to (1.4) satisfies \(B(v,\Phi )=(g,v)_{L^2(\Omega )}\) for all \(v\in V.\) This implies

$$\begin{aligned} B_\text {pw}(v_h,\Phi )-(g,v_h)_{L^2(\Omega )}= B_\text {pw}(v_h-Jv_h,\Phi )-(g,v_h-Jv_h)_{L^2(\Omega )}. \end{aligned}$$

The arguments in the proof of (a) lead to the bound (3.24) with

$$\begin{aligned} C^*_{\mathrm {NC}}:=C_J\Big ((|\mathbf{A} |_{1,\infty }+C_{\text {apx}}\Vert \mathbf{A} \Vert _{\infty }+\Vert \mathbf{b} \Vert _{\infty }+\Vert \gamma \Vert _{\infty })C^*_{\text {reg}}+1\Big ). \end{aligned}$$

The remaining analogous details are omitted in the proof of Lemma 3.4.b for brevity. \(\square \)

4 A priori error analysis

This section focuses on the stability, existence, and uniqueness of the discrete solution \(u_h\). The a priori error analysis uses the discrete inf-sup condition.

4.1 Existence and uniqueness of the discrete solution

Theorem 4.1

(stability) There exist positive constants \(\delta \le 1\) and \(C_{\mathrm {stab}}\) (depending on \(\alpha , \beta , \sigma , \rho ,\) and \(C_\mathrm {F}\)) such that, for all \(\mathcal {T}\in \mathbb {T}(\delta )\) and for all \(f\in L^2(\Omega )\), the discrete problem (3.8) has a unique solution \(u_h\in V_h\) and

$$\begin{aligned} |u_h|_{1,\text {pw}}\le C_{\mathrm {stab}}\Vert f_h\Vert _{V_h^{*}}. \end{aligned}$$

Proof

In the first part of the proof, suppose there exists some solution \(u_h\in V_h\) to the discrete problem (3.8) for some \(f\in L^2(\Omega )\). (This is certainly true for all \(f\equiv 0 \equiv u_h\), but will be discussed for all those pairs at the end of the proof and shall lead to the uniqueness of discrete solutions.) Since \(u_h\) satisfies a G\(\mathring{a}\)rding-type inequality in Proposition 3.1.b,

$$\begin{aligned} \alpha |u_h|_{1,\text {pw}}^2&\le \beta \Vert u_h\Vert ^2_{L^2(\Omega )}+B_h(u_h,u_h) = \beta \Vert u_h\Vert ^2_{L^2(\Omega )}+(f_h,u_h)_{L^2(\Omega )}. \end{aligned}$$

This, (2.14), and the definition of the dual norm in (3.12) lead to

$$\begin{aligned} \alpha |u_h|_{1,\text {pw}}\le \beta C_{\text {F}}\Vert u_h\Vert _{L^2(\Omega )}+\Vert f_h\Vert _{V_h^{*}}. \end{aligned}$$
(4.1)

Given \(g:=u_h\in L^2(\Omega )\), let \(\Phi \in V\cap H^{1+\sigma }(\Omega )\) solve the dual problem \({\mathcal {L}}^*\Phi =g\) and let \(I_h\Phi \in V_h\) be the interpolation of \(\Phi \) from Sect. 2.4. Elementary algebra shows

$$\begin{aligned} \Vert u_h\Vert ^2_{L^2(\Omega )}&=\Big ((g,u_h)_{L^2(\Omega )}-B_{\text {pw}}(u_h,\Phi )\Big )+B_{\text {pw}}(u_h,\Phi -I_h\Phi )\nonumber \\&\quad +\Big (B_{\text {pw}}(u_h,I_h\Phi )-B_h(u_h,I_h\Phi )\Big )+(f_h,I_h\Phi )_{L^2(\Omega )}. \end{aligned}$$
(4.2)

Rewrite a part of the third term corresponding to diffusion on the right-hand side of (4.2) as

$$\begin{aligned}&a^P(u_h,I_h\Phi )-a_h^P(u_h,I_h\Phi ) =(\mathbf{A} \nabla u_h,\nabla (1-\Pi _1)I_h\Phi )_{L^2(P)} \\&\quad +(\nabla (1-\Pi _1) u_h,(\mathbf{A} -\Pi _0\mathbf{A} )(\nabla \Pi _1 I_h\Phi ))_{L^2(P)}\\&\quad -S^P\big ((1-\Pi _1)u_h,(1-\Pi _1)I_h\Phi \big ). \end{aligned}$$

The Cauchy–Schwarz inequality in the semi-scalar product \(S^P(\bullet ,\bullet )\), and (3.5) with the upper bound \(\Vert \mathbf{A} \Vert _\infty \) for the coefficient \(\mathbf{A} \) in \(a^P(\bullet ,\bullet )\) lead to the estimate

$$\begin{aligned} C_s^{-1}&S^P\big ((1-\Pi _1)u_h,(1-\Pi _1)I_h\Phi \big ) \le |(1-\Pi _1)u_h|_{1,P}|(1-\Pi _1)I_h\Phi |_{1,P}\nonumber \\&\qquad \le \Vert \mathbf{A} \Vert _{\infty }|u_h|_{1,P}\Big (\Vert \nabla (I_h\Phi -\Phi )\Vert _{L^2(P)}+\Vert \nabla (1-\Pi _1I_h)\Phi \Vert _{L^2(P)}\Big )\nonumber \\&\qquad \le \Vert \mathbf{A} \Vert _{\infty }C_{\text {apx}} \Big (2+C_{\mathrm {PF}}+C_{\text {Itn}}\Big )h_P^\sigma |u_h|_{1,P}|\Phi |_{1+\sigma ,P} \end{aligned}$$
(4.3)

with Theorem 2.8.b followed by (2.12) in the final step. This and Theorem 2.8 imply that

$$\begin{aligned}&|a^P(u_h,I_h\Phi )-a_h^P(u_h,I_h\Phi )|\le h_P^\sigma |u_h|_{1,P}\Vert \Phi \Vert _{1+\sigma ,P}\\&\quad \times \Big (\Vert \mathbf{A} \Vert _\infty C_{\text {apx}}(2+C_{\mathrm {PF}}+C_{\text {Itn}})(1+C_s)+|\mathbf{A} |_{1,\infty } C_{\text {Itn}}\Big ). \end{aligned}$$

The terms \(b^P-b_h^P\) and \(c^P-c_h^P\) are controlled by

$$\begin{aligned}&|b^P(u_h,I_h\Phi )-b_h^P(u_h,I_h\Phi )|+|c^P(u_h,I_h\Phi )-c_h^P(u_h,I_h\Phi )|\\&\quad \le h_P^{\sigma }\Vert \Phi \Vert _{1+\sigma ,P}\big (\Vert \mathbf{b} \Vert _{\infty }(C_{\text {apx}}(2+C_{\mathrm {PF}}+C_{\text {Itn}})\Vert u_h\Vert _{L^2(P)}+C_{\mathrm {Itn}}C_{\mathrm {PF}}|u_h|_{1,P})\nonumber \\&+\Vert \gamma \Vert _{\infty }C_{\mathrm {PF}}(C_{\text {Itn}}\Vert u_h\Vert _{L^2(P)}+|u_h|_{1,P})\big ). \end{aligned}$$

The combination of the previous four displayed estimates with Lemma 2.6 leads to an estimate for P. The sum over all polygonal domains \(P \in \mathcal {T} \) reads

$$\begin{aligned} B_\text {pw}(u_h,I_h\Phi )-B_h(u_h,I_h\Phi )\le C_dh_{\text {max}}^{\sigma }|u_h|_{1,\text {pw}}\Vert \Phi \Vert _{1+\sigma ,\Omega } \end{aligned}$$
(4.4)

with a universal constant \(C_d\). The bound for (4.2) results from Lemma 3.4.b for the first term, the boundedness of \(B_{\text {pw}}\) (with a universal constant \(M_b:=\Vert \mathbf{A} \Vert _{\infty }+C_{\mathrm {F}}\Vert \mathbf{b} \Vert _{\infty }+C_{\mathrm {F}}^2\Vert \gamma \Vert _{\infty }\)) and (2.15) for the second term, (4.4) for the third term, and Theorem 2.8.a for the last term on the right-hand side of (4.2). This shows

$$\begin{aligned} \Vert u_h\Vert ^2_{L^2(\Omega )}&\le \Big (C^*_{\mathrm {NC}}+C_\text {I}M_b+C_d\Big )h_{\text {max}}^{\sigma }|u_h|_{1,\text {pw}}\Vert \Phi \Vert _{1+\sigma ,\Omega }+C_{\text {Itn}}\Vert f_h\Vert _{V_h^{*}}\Vert \Phi \Vert _{1,\Omega }. \end{aligned}$$

This and the regularity estimate (1.5) lead to \(C_3=C^*_{\mathrm {NC}}+C_\text {I}M_b+C_d\) in

$$\begin{aligned} \Vert u_h\Vert _{L^2(\Omega )}\le C_3\,C^*_{\text {reg}}h_{\text {max}}^\sigma |u_h|_{1,\text {pw}}+C_{\text {Itn}}\Vert f_h\Vert _{V_h^{*}}. \end{aligned}$$

The substitution of this in (4.1) proves

$$\begin{aligned} \alpha |u_h|_{1,\text {pw}}\le \beta C_{\text {F}}C_3C^*_{\text {reg}}h_{\text {max}}^\sigma |u_h|_{1,\text {pw}}+(\beta C_{\text {F}}C_{\text {Itn}}+1)\Vert f_h\Vert _{V_h^{*}}. \end{aligned}$$
(4.5)

For all \(0<h_{\text {max}}\le \delta :=(\frac{\alpha }{2\beta C_{\text {F}}C_3C^*_{\text {reg}}})^{1/\sigma }\), the constant \(\overline{c}=(1-\frac{\beta }{\alpha }C_{\text {F}}C_3C^*_{\text {reg}}h_{\text {max}}^\sigma )\) is positive and \(C_{\text {stab}}:=\frac{\beta C_{\text {F}}C_{\text {Itn}}+1}{\alpha -\beta C_{\mathrm {F}}C_3C^*_{\text {reg}}h_0^{\sigma }}\) is well-defined. This leads in (4.5) to

$$\begin{aligned} |u_h|_{1,\text {pw}}\le C_{\text {stab}}\Vert f_h\Vert _{V_h^{*}}. \end{aligned}$$
(4.6)

In the last part of the proof, suppose \(f_h\equiv 0\) and let \(u_h\) be any solution to the resulting homogeneous linear discrete system. The stability result (4.6) proves \(u_h\equiv 0\). Hence, the linear system of equations (3.8) has a unique solution and the coefficient matrix is regular. This proves that there exists a unique solution \(u_h\) to (3.8) for any right-hand side \(f_h\in V_h^*\). The combination of this with (4.6) concludes the proof. \(\square \)

An immediate consequence of Theorem 4.1 is the following discrete inf-sup estimate.

Theorem 4.2

(discrete inf-sup) There exist \(0<\delta \le 1\) and \(\overline{\beta }_0>0\) such that, for all \(\mathcal {T}\in \mathbb {T}(\delta )\),

$$\begin{aligned} \overline{\beta }_0\le \inf _{0\ne u_h\in V_h}\sup _{0\ne v_h\in V_h}\frac{B_h(u_h,v_h)}{|u_h|_{1,\text {pw}}|v_h|_{1,\text {pw}}}. \end{aligned}$$
(4.7)

Proof

Define the operator \({\mathcal {L}}_h:V_h\rightarrow V_h^* ,\) \(v_h\mapsto B_h(v_h,\bullet )\). The stability Theorem 4.1 can be interpreted as follows: For any \(f_h\in V_h^*\) there exists \(u_h\in V_h\) such that \({\mathcal {L}}_hu_h=f_h\) and

$$\begin{aligned} \overline{\beta }_0|u_h|_{1,\text {pw}}\le \Vert f_h\Vert _{V_h^{*}}=\sup _{0\ne v_h\in V_h}\frac{(f_h,v_h)}{|v_h|_{1,\text {pw}}}=\sup _{0\ne v_h\in V_h}\frac{B_h(u_h,v_h)}{|v_h|_{1,\text {pw}}}. \end{aligned}$$

The discrete problem \(B_h(u_h,\bullet )=(f_h,\bullet )\) has a unique solution in \(V_h\). Therefore, \(f_h\) and \(u_h\) are in one to one correspondence and the last displayed estimate holds for any \(u_h\in V_h\). The infimum over \(u_h\in V_h\) therein proves (4.7) with \(\overline{\beta }_0=C_{\text {stab}}^{-1}\). \(\square \)

4.2 A priori error estimates

This subsection establishes the error estimate in the energy norm \(|\cdot |_{1,\text {pw}}\) and in the \(L^2\) norm. The discrete inf-sup condition allows for an error estimate in the \(H^1\) norm and an Aubin–Nitsche duality argument leads to an error estimate in the \(L^2\) norm.

Recall \(u\in H^1_0(\Omega )\) is a unique solution of (1.8) and \(u_h\in V_h\) is a unique solution of (3.8). Recall the definition of the bilinear form \(s_h(\cdot ,\cdot )\) from Sect. 3.1 and define the induced seminorm \(|v_h|_\mathrm {s}:= s_h(v_h,v_h)^{1/2}\) for \(v_h\in V_h\) as a part of the norm \(\Vert \cdot \Vert _h\) from Remark 6.

Theorem 4.3

(error estimate) Set \(\varvec{\sigma }:=\mathbf{A} \nabla u+\mathbf{b} u\in H(\text {div},\Omega )\). There exist positive constants \(C_4, C_5,\) and \(\delta \) such that, for all \(\mathcal {T}\in \mathbb {T}(\delta )\), the discrete problem (3.8) has a unique solution \(u_h\in V_h\) and

$$\begin{aligned}&|u-u_h|_{1,\text {pw}}+|u-\Pi _1u_h|_{1,\text {pw}}+h_{\mathrm {max}}^{-\sigma }(\Vert u-u_h\Vert _{L^2(\Omega )}+\Vert u-\Pi _1u_h\Vert _{L^2(\Omega )})\nonumber \\&\qquad +|u_h|_\mathrm {s}+|I_hu-u_h|_\mathrm {s}\nonumber \\&\quad \le C_4\Big ( \Vert (1-\Pi _0) \varvec{\sigma }\Vert _{L^2(\Omega )}+\Vert (1-\Pi _0)\nabla u\Vert _{L^2(\Omega )}+\mathrm {osc}_1(f-\gamma u,\mathcal {T})\Big )\nonumber \\&\quad \le C_5h_{\mathrm {max}}^\sigma \Vert f\Vert _{L^2(\Omega )}. \end{aligned}$$
(4.8)

Proof

Step 1 (initialization). Let \(I_hu \in V_h\) be the interpolation of u from Definition 2.7. The discrete inf-sup condition (4.7) for \(I_hu-u_h\in V_h\) leads to some \(v_h\in V_h\) with \(|v_h|_{1,\text {pw}}\le 1\) such that

$$\begin{aligned} \overline{\beta }_0|I_hu-u_h|_{1,\text {pw}}= B_h(I_hu-u_h,v_h). \end{aligned}$$

Step 2 (error estimate for \(|u-u_h|_{1,\text {pw}}\)). Rewrite the last equation with the continuous and the discrete problem (1.8) and (3.8) as

$$\begin{aligned} \overline{\beta }_0|I_hu-u_h|_{1,\text {pw}}=B_h(I_hu,v_h)-B(u,v)+(f,v)_{L^2(\Omega )}-(f_h,v_h)_{L^2(\Omega )}. \end{aligned}$$

This equality is rewritten with the definition of B(uv) in (1.7), the definition of \(B_h(I_hu,v_h)\) in Sect. 3.1, and with \(f_h=\Pi _1f\). Recall \(v:=Jv_h\in V\) from Lemma 3.3 and recall \(\nabla _{\text {pw}}\Pi _1 I_hu=\Pi _0\nabla u\) from (2.17). This results in

$$\begin{aligned} \text {LHS}&:= \overline{\beta }_0|I_hu-u_h|_{1,\text {pw}}-s_h((1-\Pi _1)I_hu,(1-\Pi _1)v_h)\\&=(\mathbf{A} \Pi _0\nabla u+\mathbf{b} \Pi _1I_hu,\nabla _{\text {pw}}\Pi _1 v_h)_{L^2(\Omega )}+(\gamma \Pi _1I_hu,\Pi _1v_h)_{L^2(\Omega )}\\&\quad -(\varvec{\sigma },\nabla v)_{L^2(\Omega )}+(f-\gamma u,v)_{L^2(\Omega )}-(f,\Pi _1v_h)_{L^2(\Omega )}. \end{aligned}$$

Abbreviate \(w:=v-\Pi _1v_h\) and observe the orthogonalities \(\nabla _{\text {pw}} w\perp \mathcal {P}_0(\mathcal {T};\mathbb {R}^2)\) in \(L^2(\Omega ;\mathbb {R}^2)\) and \(w\perp \mathcal {P}_1(\mathcal {T})\) in \(L^2(\Omega )\) from Lemma 3.3.b-c and the definition of \(\Pi _1\) with \(\Pi _1=\Pi ^\nabla _1\) in \(V_h\). Lemma 3.3.d, the bound \(|(1-\Pi ^\nabla _1)v_h|_{1,\text {pw}}\le |v_h|_{1,\text {pw}}\le 1\), and the Poincaré–Friedrichs inequality for \(v_h-\Pi ^\nabla _1v_h\) from Lemma 2.1.a lead to

$$\begin{aligned} |w|_{1,\text {pw}}&\le |v-v_h|_{1,\text {pw}}+|v_h-\Pi _1v_h|_{1,\text {pw}}\le C_\mathrm {J}+1, \end{aligned}$$
(4.9)
$$\begin{aligned} \Vert h_{\mathcal {T}}^{-1}w\Vert _{L^2(\Omega )}&\le \Vert h_{\mathcal {T}}^{-1}(v-v_h)\Vert _{L^2(\Omega )}+\Vert h_{\mathcal {T}}^{-1}(v_h-\Pi _1v_h)\Vert _{L^2(\Omega )}\le C_\mathrm {J}+C_\mathrm {PF}. \end{aligned}$$
(4.10)

Elementary algebra and the above orthogonalities prove that

$$\begin{aligned} \text {LHS}= & {} ((\mathbf{A} -\Pi _0\mathbf{A} )(\Pi _0-1)\nabla u+\mathbf{b} (\Pi _1I_hu-u),\nabla _{\text {pw}}\Pi _1 v_h)_{L^2(\Omega )}\nonumber \\&-((1-\Pi _0)\varvec{\sigma },\nabla _{\text {pw}}w)_{L^2(\Omega )}+(\gamma (\Pi _1I_hu-u),\Pi _1v_h)_{L^2(\Omega )}\nonumber \\&+(h_\mathcal {T}(1-\Pi _1)(f-\gamma u),h_{\mathcal {T}}^{-1}w)_{L^2(\Omega )}\nonumber \\\le & {} \Big (|\mathbf{A} |_{1,\infty }+(1+C_{\mathrm {PF}})(\Vert \mathbf{b} \Vert _{\infty }+C_{\mathrm {F}}\Vert \gamma \Vert _{\infty })\Big )h_{\text {max}}\Vert (1-\Pi _0)\nabla u\Vert _{L^2(\Omega )}\nonumber \\&+(C_\mathrm {J}+1)\Vert (1-\Pi _0) \varvec{\sigma }\Vert _{L^2(\Omega )}+(C_\mathrm {J}+C_{\text {PF}})\mathrm {osc}_1(f-\gamma u,\mathcal {T}) \end{aligned}$$
(4.11)

with the Lipschitz continuity of \(\mathbf{A} \), Lemma 2.8.b, the stabilities of \(\Pi _1\) from (2.11), and (4.9)–(4.10) in the last step. The definition of stability term (3.5) and Theorem 2.8.b lead to

$$\begin{aligned}&C_s^{-1}s_h((1-\Pi _1)I_hu,(1-\Pi _1)v_h)\nonumber \\&\quad \le \Vert \mathbf{A} \Vert _{\infty }|(1-\Pi _1)I_hu|_{1,\text {pw}}|(1-\Pi _1)v_h|_{1,\text {pw}}\nonumber \\ {}&\quad \le \Vert \mathbf{A} \Vert _{\infty } (|I_hu-u|_{1,\text {pw}}+|u-\Pi _1I_hu|_{1,\text {pw}})|v_h|_{1,\text {pw}}\nonumber \\ {}&\quad \le \Vert \mathbf{A} \Vert _{\infty }(2+C_{\text {Itn}}+C_{\mathrm {PF}}) \Vert (1-\Pi _0)\nabla u\Vert _{L^2(\Omega )}|v_h|_{1,\text {pw}}. \end{aligned}$$
(4.12)

The triangle inequality, the bound (2.15) for the term \(|u-I_hu|_{1,\text {pw}}\), and (4.11)–(4.12) for the term \(|I_hu-u_h|_{1,\text {pw}}\) conclude the proof of (4.8) for the term \(|u-u_h|_{1,{\text {pw}}}\).

Step 3 (duality argument). To prove the bound for \(u-u_h\) in the \(L^2\) norm with a duality technique, let \(g:=I_hu-u_h\in L^2(\Omega )\). The solution \(\Phi \in H^1_0(\Omega )\cap H^{1+\sigma }(\Omega )\) to the dual problem (1.4) satisfies the elliptic regularity (1.5),

$$\begin{aligned} \Vert \Phi \Vert _{1+\sigma ,\Omega }\le C^*_{\text {reg}}\Vert I_hu-u_h\Vert _{L^2(\Omega )}. \end{aligned}$$
(4.13)

Step 4 (error estimate for \(\Vert u-u_h\Vert _{L^2(\Omega )}\)). Let \(I_h\Phi \in V_h\) be the interpolation of \(\Phi \) from Definition 2.7. Elementary algebra reveals the identity

$$\begin{aligned} \Vert g\Vert ^2_{L^2(\Omega )}&=((g,g)_{L^2(\Omega )}-B_{\text {pw}}(g,\Phi ))+B_{\text {pw}}(g,\Phi -I_h\Phi )\nonumber \\&\quad +(B_{\text {pw}}(g,I_h\Phi )-B_h(g,I_h\Phi ))+B_h(g,I_h\Phi ). \end{aligned}$$
(4.14)

The bound (4.4) with g as the first argument shows

$$\begin{aligned} B_\text {pw}(g,I_h\Phi )-B_h(g,I_h\Phi )\le C_dh_{\text {max}}^{\sigma }|g|_{1,\text {pw}}\Vert \Phi \Vert _{1+\sigma ,\Omega }. \end{aligned}$$

This controls the third term in (4.14), Lemma 3.4.b controls the first term, the boundedness of \(B_\text {pw}\) and the interpolation error estimate (2.15) control the second term on the right-hand side of (4.14). This results in

$$\begin{aligned} \Vert I_hu-u_h\Vert ^2_{L^2(\Omega )}\le (C^*_{\mathrm {NC}}+C_\mathrm {I}M_b+C_d)h_{\mathrm {max}}^{\sigma }|g|_{1,\text {pw}}\Vert \Phi \Vert _{1+\sigma ,\Omega }+B_h(g,I_h\Phi ). \end{aligned}$$
(4.15)

It remains to bound \(B_h(g,I_h\Phi )\). The continuous and the discrete problem (1.8) and (3.8) imply

$$\begin{aligned} B_h(g,I_h\Phi )=B_h(I_hu,I_h\Phi )-B(u,\Phi )+(f,\Phi )_{L^2(\Omega )}-(f_h,I_h\Phi )_{L^2(\Omega )}. \end{aligned}$$

The definition of \(B_h\) and \(\Pi _0\) lead to

$$\begin{aligned}&B_h(g,I_h\Phi )-s_h((1-\Pi _1)I_hu,(1-\Pi _1)I_h\Phi )\nonumber \\&\quad =((\mathbf{A} -\Pi _0\mathbf{A} )(\Pi _0-1)\nabla u+\mathbf{b} (\Pi _1 I_hu-u),\nabla _{\text {pw}}\Pi _1I_h\Phi )_{L^2(\Omega )}\nonumber \\&\qquad +(\gamma (\Pi _1I_hu-u),\Pi _1I_h\Phi )_{L^2(\Omega )}-((1-\Pi _0)\varvec{\sigma },\nabla _{\text {pw}}(1-\Pi _1I_h)\Phi )_{L^2(\Omega )}\nonumber \\&\qquad +(f-\gamma u,\Phi -\Pi _1I_h\Phi )_{L^2(\Omega )}. \end{aligned}$$
(4.16)

The bound for the stability term as in (4.12) is

$$\begin{aligned}&s_h((1-\Pi _1)I_hu,(1-\Pi _1)I_h\Phi )\nonumber \\&\quad \le C_s\Vert \mathbf{A} \Vert _{\infty }|(1-\Pi _1)I_hu|_{1,\text {pw}}|(1-\Pi _1)I_h\Phi |_{1,\text {pw}}\nonumber \\&\quad \le C_s\Vert \mathbf{A} \Vert _{\infty }(2+C_{\mathrm {Itn}}+C_{\mathrm {PF}})^2 C_{\text {apx}}h_{\text {max}}^{\sigma }\Vert (1-\Pi _0)\nabla u\Vert _{L^2(\Omega )}|\Phi |_{1+\sigma ,\Omega }. \end{aligned}$$
(4.17)

Step 5 (oscillation). The last term in (4.16) is of optimal order \(O(h_{\text {max}}^{1+\sigma })\), but the following arguments allow to write it as an oscillation. Recall the bubble-function \(b_{\mathcal {T}}|_P:=b_P\in H^1_0(P)\) from (3.17) extended by zero outside P. Given \(\Psi :=\Phi -\Pi _1I_h\Phi \), let \(\Psi _1\in \mathcal {P}_1(\mathcal {T})\) be the Riesz representation of the linear functional \(\mathcal {P}_1(\mathcal {T})\rightarrow \mathbb {R}\) defined by \(w_1\mapsto (\Psi ,w_1)_{L^2(\Omega )}\) in the Hilbert space \(\mathcal {P}_1(\mathcal {T})\) endowed with the weighted scalar product \((b_\mathcal {T}\bullet ,\bullet )_{L^2(\Omega )}\). That means \(\Pi _1(b_\mathcal {T}\Psi _1)=\Pi _1\Psi \). The identity \((f-\gamma u,b_{\mathcal {T}}\Psi _1)_{L^2(\Omega )}=(\varvec{\sigma },\nabla (b_\mathcal {T}\Psi _1))_{L^2(\Omega )}\) follows from (1.8) with the test function \(b_{\mathcal {T}}\Psi _1\in H^1_0(\Omega )\). The \(L^2\) orthogonalities \(\Psi -b_{\mathcal {T}}\Psi _1\perp \mathcal {P}_1(\mathcal {T})\) in \(L^2(\Omega )\) and \(\nabla ( b_{\mathcal {T}}\Psi _1)\perp \mathcal {P}_0(\mathcal {T};\mathbb {R}^2)\) in \(L^2(\Omega ;\mathbb {R}^2)\) allow the rewriting of the latter identity as

$$\begin{aligned} (f-\gamma u,\Psi )_{L^2(\Omega )}&=(h_\mathcal {T}(1-\Pi _1)(f-\gamma u),h_{\mathcal {T}}^{-1}(\Psi -b_{\mathcal {T}}\Psi _1))_{L^2(\Omega )}\nonumber \\&\quad +((1-\Pi _0)\varvec{\sigma },\nabla (b_\mathcal {T}\Psi _1))_{L^2(\Omega )}\nonumber \\&\le \mathrm {osc}_1(f-\gamma u,\mathcal {T})\Vert h_{\mathcal {T}}^{-1}(\Psi -b_{\mathcal {T}}\Psi _1)\Vert _{L^2(\Omega )}\nonumber \\&\quad +\Vert (1-\Pi _0)\varvec{\sigma }\Vert _{L^2(\Omega )}|b_\mathcal {T}\Psi |_{1,\text {pw}}. \end{aligned}$$
(4.18)

It remains to control the terms \(\Vert h_{\mathcal {T}}^{-1}(\Psi -b_{\mathcal {T}}\Psi _1)\Vert _{L^2(\Omega )}\) and \(|b_\mathcal {T}\Psi |_{1,\text {pw}}\). Since the definition of \(I_h\) and the definition of \(\Pi ^\nabla _1\) with \(\Pi _1=\Pi ^\nabla _1\) in \(V_h\) imply \(\int _{\partial P}\Psi \,ds=\int _{\partial P}(\Phi -\Pi _1I_h\Phi )\,ds=0\), this allows the Poincaré–Friedrichs inequality for \(\Psi \) from Lemma 2.1.a on each \(P\in \mathcal {T}\). This shows

$$\begin{aligned} \Vert h_\mathcal {T}^{-1}\Psi \Vert _{L^2(\Omega )}\le C_\mathrm {PF}|\Psi |_{1,\text {pw}}\le C_\mathrm {PF}C_{\text {apx}} h_{\text {max}}^{\sigma } |\Phi |_{1+\sigma ,\Omega } \end{aligned}$$
(4.19)

with Theorem 2.8.b and (2.12) in the last inequality. Since \(b_P\Psi _1\in H^1_0(P)\) for \(P\in \mathcal {T}\), the Poincaré–Friedrichs inequality from Lemma 2.1.a leads to

$$\begin{aligned} \Vert h_P^{-1}(b_P\Psi _1)\Vert _{L^2(P)}\le C_\mathrm {PF}|b_P\Psi _1|_{1,P}. \end{aligned}$$
(4.20)

The first estimate in (3.20), the identity \(\Pi _1(b_\mathcal {T}\Psi _1)=\Pi _1\Psi \), and the Cauchy–Schwarz inequality imply

$$\begin{aligned} C_b^{-1}\Vert h_{P}^{-1}\Psi _1\Vert _{L^2(P)}^2&\le \Vert h_{P}^{-1}b_{P}^{1/2}\Psi _1\Vert _{L^2(P)}^2=(h_{P}^{-1}\Psi _1,h_{P}^{-1}\Psi )_{L^2(P)}\\&\le \Vert h_{P}^{-1}\Psi _1\Vert _{L^2(P)}\Vert h_{P}^{-1}\Psi \Vert _{L^2(P)}. \end{aligned}$$

This proves \(\Vert h_{P}^{-1}\Psi _1\Vert _{L^2(P)}\le C_b \Vert h_{P}^{-1}\Psi \Vert _{L^2(P)}\). The second estimate in (3.21) followed by the first estimate in (3.20) leads to the first inequality and the arguments as above lead to the second inequality in

$$\begin{aligned} C_{b}^{-3/2} |b_P\Psi _1|_{1,P}&\le \Vert h_{P}^{-1}b_{P}^{1/2}\Psi _1\Vert _{L^2(P)}\le \Vert h_{P}^{-1}\Psi _1\Vert _{L^2(P)}^{1/2}\Vert h_{P}^{-1}\Psi \Vert _{L^2(P)}^{1/2}\\&\le C_{b}^{1/2}\Vert h_{P}^{-1}\Psi \Vert _{L^2(P)} \end{aligned}$$

with \(\Vert h_{P}^{-1}\Psi _1\Vert _{L^2(P)}^{1/2}\le C_b^{1/2} \Vert h_{P}^{-1}\Psi \Vert _{L^2(P)}^{1/2}\) from above in the last step. The combination of the previous displayed estimate and (4.18)–(4.20) results with \(C_6 := C_\mathrm {PF}C_{\text {apx}}(1+C_b^2(1+C_\mathrm {PF}))\) in

$$\begin{aligned} (f-\gamma u,\Psi )_{L^2(\Omega )}\le C_6 (\mathrm {osc}_1(f-\gamma u,\mathcal {T})+\Vert (1-\Pi _0)\varvec{\sigma }\Vert _{L^2(\Omega )})h_{\text {max}}^{\sigma }|\Phi |_{1+\sigma ,\Omega }. \end{aligned}$$
(4.21)

Step 6 (continued proof of estimate for \(\Vert u-u_h\Vert _{L^2(\Omega )}\)). The estimate in Step 2 for \(|g|_{1,\text {pw}}\), (4.15)–(4.17), and (4.21) with the regularity (4.13) show

$$\begin{aligned}&\Vert I_hu-u_h\Vert _{L^2(\Omega )} \nonumber \\&\quad \lesssim h_{\mathrm {max}}^{\sigma }\Big (\Vert (1-\Pi _0)\nabla u\Vert _{L^2(\Omega )}+\Vert (1-\Pi _0)\varvec{\sigma }\Vert _{L^2(\Omega )}+\mathrm {osc}_1(f-\gamma u,\mathcal {T})\Big ). \end{aligned}$$
(4.22)

Rewrite the difference \(u-u_h= (u-I_hu)+(I_hu-u_h)\), and apply the triangle inequality with (2.15) for the first term

$$\begin{aligned} \Vert u-I_hu\Vert _{L^2(\Omega )}\le C_\mathrm {I}h_{\text {max}}^{1+\sigma }|u|_{1+\sigma ,\Omega }. \end{aligned}$$

This and (4.22) for the second term \(I_hu-u_h\) conclude the proof of the estimate for the term \(h_{\text {max}}^{-\sigma }\Vert u-u_h\Vert _{L^2(\Omega )}\) in (4.8) .

Step 7 (stabilisation error \(|u_h|_\mathrm {s}\) and \(|I_hu-u_h|_\mathrm {s}\)). The triangle inequality and the upper bound of the stability term (3.5) lead to

$$\begin{aligned} |u_h|_\mathrm {s}\le |I_hu-u_h|_\mathrm {s}+|I_hu|_\mathrm {s}\le C_s^{1/2}\Vert \mathbf{A} \Vert _\infty ^{1/2}(|I_hu-u_h|_{1,\text {pw}}+|(1-\Pi _1)I_hu|_{1,\text {pw}}) \end{aligned}$$

with \(|(1-\Pi _1)(I_hu-u_h)|_{1,\text {pw}}\le |I_hu-u_h|_{1,\text {pw}}\) in the last inequality. The arguments as in (4.12) prove that \(|(1-\Pi _1)I_hu|_{1,\text {pw}}\le (2+C_{\text {Itn}}+C_\mathrm {PF})\Vert (1-\Pi _0)\nabla u\Vert _{L^2(\Omega )}\). This and the arguments in Step 2 for the estimate of \(|I_hu-u_h|_{1,\text {pw}}\) show the upper bound in (4.8) for the terms \(|u_h|_\mathrm {s}\) and \(|I_hu-u_h|_\mathrm {s}\).

Step 8 (error estimate for \(u-\Pi _1u_h\)). The VEM solution \(u_h\) is defined by the computed degrees of freedom given in (2.10), but the evaluation of the function itself requires expansive additional calculations. The later are avoided if \(u_h\) is replaced by the Ritz projection \(\Pi _1u_h\) in the numerical experiments. The triangle inequality leads to

$$\begin{aligned} |u-\Pi _1u_h|_{1,\text {pw}}\le |u-u_h|_{1,\text {pw}}+|u_h-\Pi _1u_h|_{1,\text {pw}}. \end{aligned}$$
(4.23)

A lower bound of the stability term (3.5) and the assumption (A2) imply

$$\begin{aligned} |u_h-\Pi _1u_h|_{1,P}\le a_0^{-1/2} C_s^{1/2}S^P((1-\Pi _1)u_h,(1-\Pi _1)u_h)^{1/2}. \end{aligned}$$
(4.24)

This shows that the second term in (4.23) is bounded by \(|u_h|_\mathrm {s}\). Hence Step 2 and Step 7 prove the estimate for \(|u-\Pi _1u_h|_{1,\text {pw}}\). Since \(\int _{\partial P}(u_h-\Pi _1u_h)\,ds=0\) from the definition of \(\Pi ^\nabla _1\) and \(\Pi _1=\Pi ^\nabla _1\) in \(V_h\), the combination of Poincaré–Friedrichs inequality for \(u_h-\Pi _1u_h\) from Lemma 2.1.a and (4.24) result in

$$\begin{aligned} C_{\mathrm {PF}}^{-1} a_0^{1/2}C_s^{-1/2}\Vert u_h-\Pi _1u_h\Vert _{L^2(P)}\le h_PS^P((1-\Pi _1)u_h,(1-\Pi _1)u_h)^{1/2}. \end{aligned}$$
(4.25)

The analogous arguments for \(\Vert u-\Pi _1u_h\Vert _{L^2(\Omega )}\), (4.25), and the estimate for \(|u_h|_\mathrm {s}\) prove the bound (4.8) for the term \(h_{\text {max}}^{-\sigma }\Vert u-\Pi _1u_h\Vert _{L^2(\Omega )}\). This concludes the proof of Theorem 4.3. \(\square \)

5 A posteriori error analysis

This section presents the reliability and efficiency of a residual-type a posteriori error estimator.

5.1 Residual-based explicit a posteriori error control

Recall \(u_h\in V_h\) is the solution to the problem (3.8), and the definition of jump \([\cdot ]_E\) along an edge \(E\in \mathcal {E}\) from Section 2. For any polygonal domain \(P\in \mathcal {T}\), set

$$\begin{aligned} \displaystyle \eta _P^2:=&h_P^2\Vert f-\gamma \Pi _1u_h\Vert _{L^2(P)}^2&{\text {(Volume residual),}}\\ \displaystyle \zeta _P^2:=&S^P((1-\Pi _1)u_h,(1-\Pi _1)u_h)&{\text {(Stabilization)}},\\ \displaystyle \Lambda _P^2:=&\Vert (1-\Pi _0)(\mathbf{A} \nabla \Pi _1u_h+\mathbf{b} \Pi _1u_h)\Vert _{L^2(P)}^2&{\text {(Inconsistency)}},\\ \displaystyle \Xi _{P}^2:=&\sum _{E\in \mathcal {E}(P)}|E|^{-1}\Vert [\Pi _1u_h]_E\Vert _{L^2(E)}^2&{\text {(Nonconformity)}}. \end{aligned}$$

These local quantities \(\bullet |_P\) form a family (\(\bullet |_P:P\in \mathcal {T}\)) over the index set \(\mathcal {T}\) and their Euclid vector norm \(\bullet |_\mathcal {T}\) enters the upper error bound: \(\eta _{\mathcal {T}}:=(\sum _{P\in \mathcal {T}}\eta _P^2)^{1/2}\), \(\zeta _\mathcal {T}:=(\sum _{P\in \mathcal {T}}\zeta _P^2)^{1/2}\), \(\Lambda _\mathcal {T}:=(\sum _{P\in \mathcal {T}}\Lambda _P^2)^{1/2}\), and \(\Xi _\mathcal {T}:=(\sum _{P\in \mathcal {T}}\Xi _P^2)^{1/2}\). The following theorem provides an upper bound to the error \(u-u_h\) in the \(H^1\) and the \(L^2\) norm. Recall the elliptic regularity (1.5) with the index \(0<\sigma \le 1\), and recall the assumption \(h_\text {max}\le 1\) from Sect. 2.1.

Theorem 5.1

(reliability) There exist positive constants \(C_{\text {rel}1}\) and \(C_{\text {rel}2}\) (both depending on \(\rho \)) such that

$$\begin{aligned} C_{\mathrm {rel}1}^{-2}|u-u_h|_{1,\text {pw}}^2\le \eta _\mathcal {T}^2+\zeta _\mathcal {T}^2+\Lambda _\mathcal {T}^2+\Xi _{\mathcal {T}}^2 \end{aligned}$$
(5.1)

and

$$\begin{aligned} \Vert u-u_h\Vert _{L^2(\Omega )}^2\le C_{\mathrm {rel}2}^{2} \sum _{P\in \mathcal {T}}\Big (h_P^{2\sigma }(\eta _P^2+\zeta _P^2+\Lambda _P^2+\Xi _{P}^2)\Big ). \end{aligned}$$
(5.2)

The proof of this theorem in Sect. 5.3 relies on a conforming companion operator elaborated in the next subsection. The upper bound in Theorem 5.1 is efficient in the following local sense, where \(\omega _E:=\text {int}(\cup \mathcal {T}(E))\) denotes the patch of an edge E and consists of the one or the two neighbouring polygons in the set \(\mathcal {T}(E):=\{P'\in \mathcal {T}:E\subset \partial P'\}\) that share E. Recall \(\varvec{\sigma }=\mathbf{A} \nabla u+\mathbf{b} u\) from Sect. 4.2 and the data-oscillation \(\mathrm {osc}_1(f,P):= \Vert h_P(1-\Pi _1)f\Vert _{L^2(P)}\) from Sect. 2.1.

Theorem 5.2

(local efficiency up to oscillation) The quantities \(\eta _P, \zeta _P, \Lambda _P,\) and \(\Xi _P\) from Theorem 5.1 satisfy

$$\begin{aligned} \zeta ^2_P&\lesssim |u-u_h|^2_{1,P}+|u-\Pi _1u_h|^2_{1,P}, \end{aligned}$$
(5.3)
$$\begin{aligned} \eta _P^2&\lesssim \Vert u-u_h\Vert ^2_{1,P}+|u-\Pi _1u_h|^2_{1,P}+\Vert (1-\Pi _0)\varvec{\sigma }\Vert _{L^2(P)}^2+\mathrm {osc}_1^2(f-\gamma u,P), \end{aligned}$$
(5.4)
$$\begin{aligned} \Lambda _P^2&\lesssim \Vert u-u_h\Vert ^2_{1,P}+|u-\Pi _1u_h|^2_{1,P}+\Vert (1-\Pi _0)\varvec{\sigma }\Vert _{L^2(P)}^2, \end{aligned}$$
(5.5)
$$\begin{aligned} \Xi _P^2&\lesssim \sum _{E\in \mathcal {E}(P)}\sum _{P'\in \omega _E}( \Vert u-u_h\Vert ^2_{1,P'}+|u-\Pi _1u_h|^2_{1,P'}). \end{aligned}$$
(5.6)

The proof of Theorem 5.2 follows in Sect. 5.4. The reliability and efficiency estimates in Theorem 5.1 and 5.2 lead to an equivalence up to the approximation term

$$\begin{aligned} \text {apx} := \Vert \varvec{\sigma }-\Pi _0\varvec{\sigma }\Vert _{L^2(\Omega )}+\mathrm {osc}_1(f-\gamma u,\mathcal {T}). \end{aligned}$$

Recall the definition of \(|u_h|_\mathrm {s}\) from Sect. 4.2. In this paper, the norm \(|\cdot |_{1,\text {pw}}\) in the nonconforming space \(V_h\) has been utilised for simplicity and one alternative is the norm \(\Vert \cdot \Vert _h\) from Remark 6 induced by \(a_h\). Then it appears natural to have the total error with the stabilisation term as

$$\begin{aligned} \text {total error}:= & {} |u-u_h|_{1,\text {pw}}+|u-\Pi _1u_h|_{1,\text {pw}}+h_{\text {max}}^{-\sigma }\Vert u-u_h\Vert _{L^2(\Omega )}\\&+h_{\text {max}}^{-\sigma }\Vert u-\Pi _1u_h\Vert _{L^2(\Omega )}+|u_h|_\mathrm {s}. \end{aligned}$$

The point is that Theorem 4.3 assures that total error \(+\) apx converges with the expected optimal convergence rate.

Corollary 5.3

(equivalence) The \(\mathrm {estimator} := \eta _\mathcal {T}+\zeta _\mathcal {T}+\Lambda _\mathcal {T}+\Xi _\mathcal {T}\approx \mathrm {total\; error} + \mathrm {apx}\).

Proof

Theorem 5.2 motivates apx and shows

$$\begin{aligned} \mathrm {estimator}\lesssim & {} \Vert u-u_h\Vert _{1,\text {pw}}+\Vert \varvec{\sigma }-\Pi _0\varvec{\sigma }\Vert _{L^2(\Omega )}+\mathrm {osc}_1(f-\gamma u,\mathcal {T})+|u_h|_\mathrm {s}\\\le & {} \mathrm {total\; error} + \mathrm {apx}. \end{aligned}$$

This proves the first inequality \(\lesssim \) in the assertion. Theorem 5.1, the estimates in Sect. 5.3.3.1, and the definition of \(|u_h|_s\) show \(\text {total error} \lesssim \text {estimator}\). The first of the terms in apx is

$$\begin{aligned}&\Vert \varvec{\sigma }-\Pi _0\varvec{\sigma }\Vert _{L^2(\Omega )}\\&\quad \le \Vert \varvec{\sigma }-\Pi _0\varvec{\sigma }_h\Vert _{L^2(\Omega )}\le \Vert \varvec{\sigma }-\varvec{\sigma }_h\Vert _{L^2(\Omega )}+\Vert (1-\Pi _0)\varvec{\sigma }_h\Vert _{L^2(\Omega )}. \end{aligned}$$

The definition of \(\varvec{\sigma }\) and \(\varvec{\sigma }_h\) plus the triangle and the Cauchy–Schwarz inequality show

$$\begin{aligned} \Vert \varvec{\sigma }-\varvec{\sigma }_h\Vert _{L^2(\Omega )}\le \Vert \mathbf{A} \Vert _\infty |u-\Pi _1u_h|_{1,\text {pw}}+\Vert \mathbf{b} \Vert _{\infty }\Vert u-\Pi _1u_h\Vert _{L^2(\Omega )}\lesssim \Vert u-\Pi _1u_h\Vert _{1,\text {pw}}. \end{aligned}$$

The upper bound is \(\lesssim \) estimator as mentioned above. Since the term \(\Vert (1-\Pi _0)\varvec{\sigma }_h\Vert _{L^2(\Omega )}= \Lambda _\mathcal {T}\) is a part of the estimator, \(\Vert (1-\Pi _0)\varvec{\sigma }\Vert _{L^2(\Omega )}\lesssim \mathrm {estimator}\). The other term in apx is

$$\begin{aligned} \mathrm {osc}_1(f-\gamma u,\mathcal {T})&\le \mathrm {osc}_1(f-\gamma \Pi _1u_h,\mathcal {T})+\Vert h_\mathcal {T}\gamma (u-\Pi _1u_h)\Vert _{L^2(\Omega )}\\ {}&\le \eta _\mathcal {T}+\Vert \gamma \Vert _\infty h_{\text {max}}\Vert u-\Pi _1u_h\Vert _{L^2(\Omega )}\lesssim \text {estimator}. \end{aligned}$$

\(\square \)

Section 5 establishes the a posteriori error analysis of the nonconforming VEM. Related results are known for the conforming VEM and the nonconforming FEM.

Remark 7

(comparison with nonconforming FEM) Theorem 5.1 generalizes a result for the nonconforming FEM in [19, Thm. 3.4] from triangulations into triangles to those in polygons (recall Example 2.2). The only difference is the extra stabilization term that can be dropped in the nonconforming FEM.

Remark 8

(comparison with conforming VEM) The volume residual, the inconsistency term, and the stabilization also arise in the a posteriori error estimator for the conforming VEM in [16, Thm. 13]. But it also includes an additional term with normal jumps compared to the estimator (5.1). The extra nonconformity term in this paper is caused by the nonconformity \(V_h\not \subset V\) in general.

5.2 Enrichment and conforming companion operator

The link from the nonconforming approximation \(u_h\in V_h\) to a global Sobolev function in \(H^1_0(\Omega )\) can be designed with the help of the underlying refinement \(\widehat{{\mathcal {T}}}\) of the triangulation \(\mathcal {T}\) (from Sect. 2). The interpolation \(I_{\text {CR}}:V+V_h\rightarrow \text {CR}^1_0(\widehat{{\mathcal {T}}})\) in the Crouzeix-Raviart finite element space \(\text {CR}^1_0(\widehat{{\mathcal {T}}})\) from Sect. 3.4 allows for a right-inverse \(J'\). A companion operator \(J'\circ I_{\text {CR}}:V_h\rightarrow H^1_0(\Omega )\) acts as displayed

figure a
figure b

Define an enrichment operator \(E_\text {pw}:\mathcal {P}_1(\widehat{{\mathcal {T}}})\rightarrow S^1_0(\widehat{{\mathcal {T}}})\) by averaging nodal values: For any vertex z in the refined triangulation \(\widehat{{\mathcal {T}}}\), let \(\widehat{{\mathcal {T}}}(z)=\{T\in \widehat{{\mathcal {T}}}:z\in T\}\) denote the set of \(|\widehat{{\mathcal {T}}}(z)|\ge 1\) many triangles that share the vertex z, and define

$$\begin{aligned} E_\text {pw}v_1(z)=\frac{1}{|\widehat{{\mathcal {T}}}(z)|}\sum _{T\in \widehat{{\mathcal {T}}}(z)}{v_1}|_{T}(z) \end{aligned}$$

for an interior vertex z (and zero for a boundary vertex z according to the homogeneous boundary conditions). This defines \(E_\text {pw}v_1\) at any vertex of a triangle T in \(\widehat{{\mathcal {T}}}\), and linear interpolation then defines \(E_\text {pw}v_1\) in \(T\in \widehat{{\mathcal {T}}}\), so that \(E_\text {pw}v_1\in S^1_0(\widehat{{\mathcal {T}}})\). Huang et al. [31] design an enrichment operator by an extension of [32] to polygonal domains, while we deduce it from a sub-triangulation. The following lemma provides an approximation property of the operator \(E_\text {pw}\).

Lemma 5.4

There exists a positive constant \(C_{En}\) that depends only on the shape regularity of \(\widehat{{\mathcal {T}}}\) such that any \(v_1\in \mathcal {P}_1(\mathcal {T})\) satisfies

$$\begin{aligned} \Vert h_{\mathcal {T}}^{-1}(1-E_\text {pw}) v_1\Vert _{L^2(\Omega )}+ |(1-E_\text {pw}) v_1|_{1,\text {pw}}\le C_\mathrm {En}\left( \sum _{E\in {\mathcal {E}}}|E|^{-1}\Vert [v_1]_E\Vert _{L^2(E)}^2\right) ^{1/2}. \end{aligned}$$
(5.7)

Proof

There exists a positive constant \(C_{En}\) independent of h and \(v_1\) [32, p. 2378] such that

$$\begin{aligned}&\Vert h_{\widehat{{\mathcal {T}}}}^{-1}(1-E_\text {pw}) v_1\Vert _{L^2(\Omega )}+\left( \sum _{T\in \widehat{{\mathcal {T}}}}\Vert \nabla (1-E_\text {pw}) v_1\Vert _{L^2(T)}^2\right) ^{1/2}\\&\quad \le C_\mathrm {En}\left( \sum _{E\in \widehat{\mathcal {E}}}|E|^{-1}\Vert [v_1]_E\Vert _{L^2(E)}^2\right) ^{1/2}. \end{aligned}$$

Note that any edge \(E\in \mathcal {E}\) is unrefined in the sub-triangulation \(\widehat{{\mathcal {T}}}\). Since \(v_{1|P}\in H^1(P)\) is continuous in each polygonal domain \(P\in \mathcal {T}\) and \(h_T\le h_P\) for all \(T\in \widehat{{\mathcal {T}}}(P)\), the above inequality reduces to (5.7). This concludes the proof. \(\square \)

Recall the \(L^2\) projection \(\Pi _1\) onto the piecewise affine functions \(\mathcal {P}_1(\mathcal {T})\) from Sect. 2. An enrichment operator \(E_\text {pw}\circ \Pi _1:V_h\rightarrow H^1_0(\Omega )\) acts as displayed

figure c

5.3 Proof of Theorem 5.1

5.3.1 Reliable \(H^1\) error control

Define \(E_1u_h:=E_\text {pw}\Pi _1u_h\in H^1_0(\Omega )\) so that \(u-E_1u_h \in H^1_0(\Omega )\). The inf-sup condition (1.9) leads to some \(v\in H^1_0(\Omega )\) with \(\Vert v\Vert _{1,\Omega }\le 1\) and

$$\begin{aligned} \beta _0\Vert u-E_1u_h\Vert _{1,\Omega }&= B(u-E_1u_h,v)=((f,v)_{L^2(\Omega )}-B_{\text {pw}}(\Pi _1u_h,v))\nonumber \\&\quad +B_{\text {pw}}(\Pi _1u_h-E_1u_h,v) \end{aligned}$$
(5.8)

with \(B(u,v)=(f,v)\) from (1.8) and the piecewise version \(B_\text {pw}\) of B in the last step. The definition of \(B_h\) from Sect. 3.1 and the discrete problem (3.8) with \(v_h= I_hv\) imply

$$\begin{aligned}&B_{\text {pw}}(\Pi _1u_h,\Pi _1I_hv)+s_h((1-\Pi _1)u_h,(1-\Pi _1)I_hv)\nonumber \\&\quad =B_h(u_h,I_hv)=(f,\Pi _1I_hv)_{L^2(\Omega )}. \end{aligned}$$
(5.9)

Abbreviate \(w:=v-\Pi _1I_hv\) and \(\varvec{\sigma }_h:=\mathbf{A} \nabla _{\text {pw}}\Pi _1u_h+\mathbf{b} \Pi _1u_h\). This and (5.9) simplify

$$\begin{aligned}&(f,v)_{L^2(\Omega )}-B_{\text {pw}}(\Pi _1u_h,v)= (f,w)_{L^2(\Omega )}-B_{\text {pw}}(\Pi _1u_h,w)\nonumber \\&\qquad +s_h((1-\Pi _1)u_h,(1-\Pi _1)I_hv)\nonumber \\&\quad =(f-\gamma \Pi _1u_h,w)_{L^2(\Omega )}-((1-\Pi _0)\varvec{\sigma }_h,\nabla _{\text {pw}}w)_{L^2(\Omega )}\nonumber \\&\qquad +s_h((1-\Pi _1)u_h,(1-\Pi _1)I_hv) \end{aligned}$$
(5.10)

with \(\int _P\nabla w\,dx=0\) for any \(P\in \mathcal {T}\) from (2.17) in the last step. Recall the notation \(\eta _P, \Lambda _P\), and \(\zeta _P\) from Sect. 5.1. The Cauchy–Schwarz inequality and Theorem 2.8.b followed by \(\Vert (1-\Pi _0)\nabla v\Vert _{L^2(\Omega )}\le |v|_{1,\Omega }\le 1\) in the second step show

$$\begin{aligned} (f-\gamma \Pi _1u_h,w)_{L^2(P)}&\le \eta _Ph_P^{-1}\Vert w\Vert _{L^2(P)}\le (1+C_\mathrm {PF})\eta _P, \end{aligned}$$
(5.11)
$$\begin{aligned} ((1-\Pi _0)\varvec{\sigma }_h,\nabla w)_{L^2(P)}&\le \Lambda _P|w|_{1,P} \le (1+C_\mathrm {PF})\Lambda _P. \end{aligned}$$
(5.12)

The upper bound \(\Vert \mathbf{A} \Vert _\infty \) of the coefficient \(\mathbf{A} \), (3.5), and the Cauchy–Schwarz inequality for the stabilization term lead to the first inequality in

$$\begin{aligned}&C_s^{-1/2}S^P((1-\Pi _1)u_h,(1-\Pi _1)I_hv)\nonumber \\&\quad \le \Vert \mathbf{A} \Vert _\infty ^{1/2}S^P((1-\Pi _1)u_h,(1-\Pi _1)u_h)^{1/2} |(1-\Pi _1)I_hv|_{1,P} \nonumber \\&\quad \le \Vert \mathbf{A} \Vert _\infty ^{1/2}(2+C_{\mathrm {PF}}+C_{\text {Itn}})\zeta _P. \end{aligned}$$
(5.13)

The second inequality in (5.13) follows as in (4.3) and with \(\Vert (1-\Pi _0)\nabla v\Vert _{L^2(P)}\le 1\). Recall the boundedness constant \(M_b\) of \(B_\text {pw}\) from Subsection 4.1 and deduce from (5.7) and the definition of \(\Xi _\mathcal {T}\) from Sect. 5.1 that

$$\begin{aligned} B_{\text {pw}}(\Pi _1u_h-E_1u_h,v)\le M_b|\Pi _1u_h-E_1u_h|_{1,\text {pw}}\le M_bC_\mathrm {En}\Xi _\mathcal {T}. \end{aligned}$$
(5.14)

The substitution of (5.10)–(5.14) in (5.8) reveals that

$$\begin{aligned} \Vert u-E_1u_h\Vert _{1,\Omega } \le C_7(\eta _\mathcal {T}+\Lambda _\mathcal {T}+\zeta _\mathcal {T}+\Xi _\mathcal {T}) \end{aligned}$$
(5.15)

with \(\beta _0C_7=1+C_{\mathrm {PF}}+C_s^{1/2}\Vert \mathbf{A} \Vert _\infty ^{1/2}(2+C_{\mathrm {PF}}+C_{\text {Itn}})+M_bC_\mathrm {En}.\)

The combination of (4.24), (5.15) and (5.7) leads in the triangle inequality

$$\begin{aligned} |u-u_h|_{1,\text {pw}}\le |u-E_1u_h|_{1,\Omega }+|E_1u_h-\Pi _1u_h|_{1,\text {pw}}+|\Pi _1u_h-u_h|_{1,\text {pw}} \end{aligned}$$

to (5.1) with \(C_{\text {rel}1}/2=C_7+C_\mathrm {En}+a_0^{-1/2}C_s^{1/2}\).

5.3.2 Reliable \(L^2\) error control

Recall \(I_{\text {CR}}\) from (3.14) and \(J'\) from the proof of Lemma 3.3, and define \(E_2u_h:=J'I_{\text {CR}}u_h\in H^1_0(\Omega )\) from Subsection 5.2. Let \(\Psi \in H^1_0(\Omega )\cap H^{1+\sigma }(\Omega )\) solve the dual problem \(B(v,\Psi )=(u-E_2u_h,v)\) for all \(v\in V\) and recall (from (1.5)) the regularity estimate

$$\begin{aligned} \Vert \Psi \Vert _{1+\sigma ,\Omega }\le C^*_{\text {reg}}\Vert u-E_2u_h\Vert _{L^2(\Omega )}. \end{aligned}$$
(5.16)

The substitution of \(v:=u-E_2u_h\in V\) in the dual problem shows

$$\begin{aligned} \Vert u-E_2u_h\Vert ^2_{L^2(\Omega )}=B(u-E_2u_h,\Psi ). \end{aligned}$$

The algebra in (5.8)–(5.10) above leads with \(v=\Psi \) to the identity

$$\begin{aligned}&\Vert u-E_2u_h\Vert ^2_{L^2(\Omega )}-s_h((1-\Pi _1)u_h,(1-\Pi _1)I_h\Psi )\nonumber \\&\quad =(f-\gamma \Pi _1u_h,\Psi -\Pi _1I_h\Psi )_{L^2(\Omega )}-((1-\Pi _0)\varvec{\sigma }_h,\nabla _\text {pw}(\Psi -\Pi _1I_h\Psi ))_{L^2(\Omega )}\nonumber \\&\qquad +B_\text {pw}(\Pi _1u_h-E_2u_h,\Psi ). \end{aligned}$$
(5.17)

The definition of \(I_{\text {CR}}\) and \(J'\) proves the first and second equality in

$$\begin{aligned} \int _{E}u_h\,ds=\int _{E}I_{\text {CR}}u_h\,ds=\int _{E}E_2u_h\,ds\quad \text {for all}\;E\in \mathcal {E}. \end{aligned}$$

This and an integration by parts imply \(\int _P\nabla (u_h-E_2u_h)\,dx=0\) for all \(P\in \mathcal {T}\). Hence Definition 2.2 of Ritz projection \(\Pi ^\nabla _1=\Pi _1\) in \(V_h\) shows \(\int _{P}\nabla (\Pi _1u_h-E_2u_h)\,ds=0\) for all \(P\in \mathcal {T}\). This \(L^2\) orthogonality \(\nabla _\text {pw}(\Pi _1u_h-E_2u_h)\perp \mathcal {P}_0(\mathcal {T};\mathbb {R}^2)\) and the definition of \(B_\text {pw}\) in the last term of (5.17) result with elementary algebra in

$$\begin{aligned} B_\text {pw}(\Pi _1u_h-E_2u_h,\Psi )&=((\mathbf{A} -\Pi _0\mathbf{A} )\nabla _\text {pw}(\Pi _1u_h-E_2u_h),\nabla \Psi )_{L^2(\Omega )}\nonumber \\&\quad +(\nabla _\text {pw}(\Pi _1u_h-E_2u_h),(\Pi _0\mathbf{A} )(1-\Pi _0)\nabla \Psi )_{L^2(\Omega )}\nonumber \\&\quad +(\Pi _1u_h-E_2u_h,\mathbf{b} \cdot \nabla \Psi +\gamma \Psi )_{L^2(\Omega )}. \end{aligned}$$
(5.18)

The triangle inequality and (c) from the proof of Lemma 3.3 imply the first inequality in

$$\begin{aligned} |\Pi _1u_h-E_2u_h|_{1,\text {pw}}&\le |\Pi _1u_h-I_{\text {CR}}u_h|_{1,\text {pw}}+C_\mathrm {J'}\min _{v\in V}|I_{\text {CR}}u_h-v|_{1,\text {pw}}\nonumber \\&\le |\Pi _1u_h-I_{\text {CR}}u_h|_{1,\text {pw}}+C_\mathrm {J'}|I_{\text {CR}}u_h-E_1u_h|_{1,\text {pw}}\nonumber \\&\le |\Pi _1u_h-I_{\text {CR}}u_h|_{1,\text {pw}}+C_\mathrm {J'}(|I_{\text {CR}}u_h-\Pi _1u_h|_{1,\text {pw}}\nonumber \\&\quad +|\Pi _1u_h-E_1u_h|_{1,\text {pw}})\nonumber \\&\le (1+C_\mathrm {J'})|u_h-\Pi _1u_h|_{1,\text {pw}}+C_\mathrm {J'}|\Pi _1u_h-E_1u_h|_{1,\text {pw}}. \end{aligned}$$
(5.19)

The second estimate in (5.19) follows from \(E_1u_h\in V\), the third is a triangle inequality, and eventually \(|\Pi _1u_h-I_{\text {CR}}u_h|_{1,\text {pw}}\le |u_h-\Pi _1u_h|_{1,\text {pw}}\) results from the orthogonality \(\nabla _\text {pw}(u_h-I_{\text {CR}})\perp \mathcal {P}_0(\widehat{{\mathcal {T}}};\mathbb {R}^2)\) and \(\Pi _1u_h\in \mathcal {P}_1(\mathcal {T})\). The Cauchy–Schwarz inequality, the Lipschitz continuity of \(\mathbf{A} \), and the approximation estimate \(\Vert (1-\Pi _0)\nabla \Psi \Vert _{L^2(P)}\le C_{\text {apx}}h_P^\sigma |\Psi |_{1+\sigma ,P}\) in (5.18) lead to the first inequality in

$$\begin{aligned}&B_\text {pw}(\Pi _1u_h-E_2u_h,\Psi )\le \sum _{P\in \mathcal {T}}\Big ((h_P|\mathbf{A} |_{1,\infty }+\Vert \mathbf{A} \Vert _\infty C_{\text {apx}}h_P^\sigma )|\Pi _1u_h-E_2u_h|_{1,P}\nonumber \\&\qquad +\Vert \Pi _1u_h-E_2u_h\Vert _{L^2(P)}(\Vert \mathbf{b} \Vert _\infty +\Vert \gamma \Vert _\infty )\Big )\Vert \Psi \Vert _{1+\sigma ,P}\nonumber \\&\quad \le \sum _{P\in \mathcal {T}}\Big (h_P|\mathbf{A} |_{1,\infty }+\Vert \mathbf{A} \Vert _\infty C_{\text {apx}}h_P^\sigma +C_\mathrm {PF}(\Vert \mathbf{b} \Vert _{\infty }+\Vert \gamma \Vert _{\infty })h_P\Big )\nonumber \\&\qquad |\Pi _1u_h-E_2u_h|_{1,P}\Vert \Psi \Vert _{1+\sigma ,P}\nonumber \\&\quad \le C_8\sum _{P\in \mathcal {T}}h_P^\sigma ((1+C_\mathrm {J'})|u_h-\Pi _1u_h|_{1,P}\nonumber \\&\qquad +C_\mathrm {J'}|\Pi _1u_h-E_1u_h|_{1,P})\Vert \Psi \Vert _{1+\sigma ,P}. \end{aligned}$$
(5.20)

The second inequality in (5.20) follows from the Poincaré–Friedrichs inequality in Lemma 2.1.a for \(\Pi _1u_h-E_2u_h\) with \(\int _{\partial P}(\Pi _1u_h-E_2u_h)\,ds=0\) (from above); the constant \(C_8 := |\mathbf{A} |_{1,\infty }+C_{\text {apx}}\Vert \mathbf{A} \Vert _{\infty }+C_\mathrm {PF}(\Vert \mathbf{b} \Vert _{\infty }+\Vert \gamma \Vert _{\infty })\) results from (5.19) and \(h_P\le h_P^{\sigma }\) (recall \(h_\text {max}\le 1\)). Lemma 5.4 with \(v_1=\Pi _1u_h\) and (4.24) in (5.20) show

$$\begin{aligned}&B_\text {pw}(\Pi _1u_h-E_2u_h,\Psi ) \nonumber \\&\quad \le C_8\sum _{P\in \mathcal {T}}h_P^\sigma ((1+C_\mathrm {J'})a_0^{-1/2}C_s^{1/2}\zeta _P+C_\mathrm {J'}C_\mathrm {En}\Xi _P)\Vert \Psi \Vert _{1+\sigma ,P}. \end{aligned}$$
(5.21)

Rewrite (5.11)–(5.13) with \(w=\Psi -\Pi _1I_h\Psi \) and \(h_P^{-1}\Vert w\Vert _{L^2(P)}+|w|_{1,P}\le (1+C_\mathrm {PF})\Vert (1-\Pi _0)\nabla \Psi \Vert _{L^2(P)}\le C_{\text {apx}}(1+C_\mathrm {PF})h_P^\sigma |\Psi |_{1+\sigma ,P}\) from (2.12). This and (5.21) lead in (5.17) to

$$\begin{aligned} \Vert u-E_2u_h\Vert _{L^2(\Omega )}^2 \le C_9\sum _{P\in \mathcal {T}}h_P^\sigma (\eta _P+\zeta _P+\Lambda _P+\Xi _P)\Vert \Psi \Vert _{1+\sigma ,P} \end{aligned}$$

for \(C_9:=C_{\text {apx}}(1+C_\mathrm {PF}+C_s^{1/2}\Vert \mathbf{A} \Vert _\infty ^{1/2}(2+C_{\mathrm {PF}}+C_{\text {Itn}}))+C_8((1+C_\mathrm {J'})a_0^{-1/2}C_s^{1/2}+C_\mathrm {J'}C_\mathrm {En}).\) This and the regularity (5.16) result in

$$\begin{aligned} \Vert u-E_2u_h\Vert _{L^2(\Omega )}\le C_9C^*_{\text {reg}}\sum _{P\in \mathcal {T}}h_P^\sigma (\eta _P+\zeta _P+\Lambda _P+\Xi _P). \end{aligned}$$
(5.22)

The arguments in the proof of (5.20)–(5.21) also lead to

$$\begin{aligned} \Vert E_2u_h-\Pi _1u_h\Vert _{L^2(\Omega )}\le C_\mathrm {PF}((1+C_\mathrm {J'})a_0^{-1/2}C_s^{1/2}+C_\mathrm {J'}C_\mathrm {En})\sum _{P\in \mathcal {T}}h_P(\zeta _P+\Xi _P). \end{aligned}$$
(5.23)

The combination of (4.25), (5.22)–(5.23) and the triangle inequality

$$\begin{aligned} \Vert u-u_h\Vert _{L^2(\Omega )} \le \Vert u-E_2u_h\Vert _{L^2(\Omega )}+\Vert E_2u_h-\Pi _1u_h\Vert _{L^2(\Omega )}+\Vert \Pi _1u_h-u_h\Vert _{L^2(\Omega )} \end{aligned}$$

lead to (5.2) with \( C_{rel2}/2=C_9C^*_{\text {reg}}+C_\mathrm {PF}\big ((2+C_\mathrm {J'})a_0^{-1/2}C_s^{1/2}+C_\mathrm {J'}C_\mathrm {En}\big ).\) This concludes the proof of the \(L^2\) error estimate in Theorem 5.1.

5.3.3 Comments

5.3.3.1 Estimator for \({\varvec{u}}-{\varvec{\Pi }}_{\varvec{1}}{\varvec{u}}_{\varvec{h}}\)

The triangle inequality with (5.1) and (4.24) provide an upper bound for \(H^1\) error

$$\begin{aligned} \frac{1}{2}|u-\Pi _1u_h|_{1,\text {pw}}^2 {\le } |u-u_h|_{1,\text {pw}}^2+|(1-\Pi _1)u_h|_{1,\text {pw}}^2 \le 2C_{\text {rel}1}^2(\eta _\mathcal {T}^2{+}\zeta _\mathcal {T}^2{+}\Lambda _\mathcal {T}^2{+}\Xi _\mathcal {T}^2). \end{aligned}$$

The same arguments for an upper bound of the \(L^2\) error in Theorem 5.1 show that

$$\begin{aligned} \frac{1}{2}\Vert u-\Pi _1u_h\Vert _{L^2(\Omega )}^2&\le \Vert u-u_h\Vert _{L^2(\Omega )}^2+\Vert (1-\Pi _1)u_h\Vert _{L^2(\Omega )}^2 \\&\le C_{\text {rel}2}^2\sum _{P\in \mathcal {T}}h_P^{2\sigma }(\eta _P^2+2\zeta _P^2+\Lambda _P^2+\Xi _{P}^2). \end{aligned}$$

The numerical experiments do not display \(C_{\text {rel}1}\) and \(C_{\text {rel}2}\), and directly compare the error \(H1e:=|u-\Pi _1u_h|_{1,\text {pw}}\) in the piecewise \(H^1\) norm and the error \(L2e:=\Vert u-\Pi _1u_h\Vert _{L^2(\Omega )}\) in the \(L^2\) norm with the upper bound \(H1\mu \) and \(L2\mu \) (see, e.g., Fig.  5).

5.3.3.2 Motivation and discussion of apx

We first argue that those extra terms have to be expected and utilize the abbreviations \(\varvec{\sigma }:= \mathbf{A} \nabla u+\mathbf{b} u\) and \(g := f-\gamma u\) for the exact solution \(u\in H^1_0(\Omega )\) to (1.8), which reads

$$\begin{aligned} (\varvec{\sigma },\nabla v)_{L^2(\Omega )} = (g,v)_{L^2(\Omega )}\quad \text {for all}\; v\in H^1_0(\Omega ). \end{aligned}$$
(5.24)

Recall the definition of \(s_h(\cdot ,\cdot )\) from Sect. 3.1. The discrete problem (3.8) with the discrete solution \(u_h\in V_h\) assumes the form

$$\begin{aligned} (\varvec{\sigma }_h,\nabla \Pi _1 v_h)_{L^2(\Omega )} {+} s_h((1{-}\Pi _1)u_h,(1-\Pi _1)v_h){=} (g_h,\Pi _1v_h)_{L^2(\Omega )}\ \text {for all}\; v_h\in V_h \end{aligned}$$
(5.25)

for \(\varvec{\sigma }_h:=\mathbf{A} \nabla \Pi _1u_h+\mathbf{b} \Pi _1u_h\), and \(g_h:=f-\gamma \Pi _1u_h\). Notice that \(\varvec{\sigma }_h\) and \(g_h \) may be replaced in (5.25) by \(\Pi _0\varvec{\sigma }_h\) and \(\Pi _1g_h\) because the test functions \(\nabla \Pi _1v_h\) and \(\Pi _1v_h\) belong to \(\mathcal {P}_0(\mathcal {T};\mathbb {R}^2)\) and \(\mathcal {P}_1(\mathcal {T})\) respectively. In other words, the discrete problems (3.8) and (5.25) do not see a difference of \(\varvec{\sigma }_h\) and \(g_h\) compared to \(\Pi _0\varvec{\sigma }_h\) and \(\Pi _1g_h\) and so the errors \(\varvec{\sigma }_h-\Pi _0\varvec{\sigma }_h\) and \(g_h-\Pi _1g_h\) may arise in a posteriori error control. This motivates the a posteriori error term \(\Vert \varvec{\sigma }_h-\Pi _0\varvec{\sigma }_h\Vert _{L^2(\Omega )}=\Lambda _\mathcal {T}\) as well as the approximation terms \(\varvec{\sigma }-\Pi _0\varvec{\sigma }\) and \(g-\Pi _1g\) on the continuous level. The natural norm for the dual variable \(\varvec{\sigma }\) is \(L^2\) and that of g is \(H^{-1}\) and hence their norms form the approximation term apx as defined in Sect. 5.1.

Example 5.1

(\(\mathbf{b} =0\)) The term \((1-\Pi _0)\varvec{\sigma }\) may not be visible in case of no advection \(\mathbf{b} =0\) at least if \(\mathbf{A} \) is piecewise constant. Suppose \(\mathbf{A} \in \mathcal {P}_0(\mathcal {T};\mathbb {R}^{2\times 2})\) and estimate

$$\begin{aligned} \Vert (1-\Pi _0)(\mathbf{A} \nabla u)\Vert _{L^2(\Omega )}\le \Vert \mathbf{A} \Vert _\infty \Vert (1-\Pi _0)\nabla u\Vert _{L^2(\Omega )}\lesssim |u-\Pi _1u_h|_{1,\text {pw}}. \end{aligned}$$

If \(\mathbf{A} \) is not constant, there are oscillation terms that can be treated properly in adaptive mesh-refining algorithms, e.g., in [27].

Example 5.2

(\(\gamma \) piecewise constant) While the data approximation term \(\mathrm {osc}_1(f,\mathcal {T})\) [10] is widely accepted as a part of the total error in the approximation of nonlinear problems, the term \(\mathrm {osc}_1(\gamma u,\mathcal {T})=\Vert \gamma h_\mathcal {T}(u-\Pi _1 u)\Vert _{L^2(\Omega )}\lesssim h_{\text {max}}^{1+\sigma }\Vert f\Vert _{L^2(\Omega )}\) is of higher order and may even be absorbed in the overall error analysis for a piecewise constant coefficient \(\gamma \in \mathcal {P}_0(\mathcal {T})\). In the general case \(\gamma \in L^\infty (\Omega )\backslash \mathcal {P}_0(\mathcal {T})\), however, \(\mathrm {osc}_1(u,\mathcal {T})\) leads in particular to terms with \(\Vert \gamma -\Pi _0\gamma \Vert _{L^\infty (\Omega )}\).

5.3.3.3 Higher-order nonconforming VEM

The analysis applied in Theorem 5.1 can be extended to the nonconforming VEM space of higher order \(k\in \mathbb {N}\) (see [17, Sec. 4] for the definition of discrete space). Since the projection operators \(\nabla \Pi _k^{\nabla }\) and \(\Pi _{k-1}\nabla \) are not the same for general k, and the first operator does not lead to optimal order of convergence for \(k\ge 3\), the discrete formulation uses \(\Pi _{k-1}\nabla \) (cf. [6, Rem. 4.3] for more details). The definition and approximation properties of the averaging operator \(E_\text {pw}\) extend to the operator \(E^k:\mathcal {P}_k(\widehat{{\mathcal {T}}})\rightarrow H^1_0(\Omega )\) (see [32, p. 2378] for a proof). The identity (5.9) does not hold in general, but algebraic calculations lead to

$$\begin{aligned} \eta _P^2&:=\, h_P^2\Vert f-\gamma \Pi _ku_h\Vert _{L^2(P)}^2,\quad \Lambda _P^2:=\, \Vert (1-\Pi _{k-1})(\mathbf{A} \Pi _{k-1}\nabla u_h+\mathbf{b} \Pi _k u_h)\Vert _{L^2(P)}^2\\ \zeta _P^2&:=S^P((1-\Pi _k)u_h,(1-\Pi _k)u_h),\qquad \Xi _P^2:=\sum _{E\in \mathcal {E}(P)}|E|^{-1}\Vert [\Pi _ku_h]_E\Vert ^2_{L^2(E)}. \end{aligned}$$

The analysis developed for the upper bound of \(L^2\) norm also extends to the general case. The model problem is chosen in 2D for the simplicity of the presentation. The results of this work can be extended to the three-dimensional case with appropriate modifications. The present analysis holds for any higher regularity index \(\sigma >0\) and avoids any trace inequality for higher derivatives. This is possible by a medius analysis in the form of companion operators [26].

5.3.3.4 Inhomogeneous boundary data

The error estimator for general Dirichlet condition \(u|_{\partial \Omega }=g\in H^{1/2}(\partial \Omega )\) can be obtained with some modifications of [33] in Theorem 5.1. The only difference is in the modified jump contributions of the boundary edges in the nonconformity term

$$\begin{aligned} \Xi _\mathcal {T}^2=\sum _{E\in \mathcal {E}(\Omega )}|E|^{-1}\Vert [\Pi _1u_h]\Vert _{L^2(E)}^2+\sum _{E\in \mathcal {E}(\partial \Omega )}|E|^{-1}\Vert g-\Pi _1u_h\Vert _{L^2(E)}^2. \end{aligned}$$

5.4 Proof of Theorem 5.2

Recall the notation \(\varvec{\sigma } =\mathbf{A} \nabla u+\mathbf{b} u\) and \(\varvec{\sigma }_h=\mathbf{A} \nabla \Pi _1u_h+\mathbf{b} \Pi _1u_h\) from Sect. 5.3.

Proof of 5.3

The upper bound (3.5) for the stabilisation term and the triangle inequality show

$$\begin{aligned} \zeta _P^2\le C_s|(1-\Pi _1)u_h|^2_{1,P}\le 2C_s(|u-u_h|^2_{1,P}+|u-\Pi _1u_h|^2_{1,P}). \end{aligned}$$

This concludes the proof of (5.3). \(\square \)

Proof of (5.5)

The definition of \(\Lambda _P, \Pi _0\), and the triangle inequality lead to

$$\begin{aligned} \Lambda _P =\Vert \varvec{\sigma }_h-\Pi _0\varvec{\sigma }_h\Vert _{L^2(P)}&\le \Vert \varvec{\sigma }_h-\Pi _0\varvec{\sigma }\Vert _{L^2(P)}\nonumber \\&\le \Vert \mathbf{A} \nabla (\Pi _1u_h-u)+\mathbf{b} (\Pi _1u_h-u)\Vert _{L^2(P)}\nonumber \\&\quad +\Vert (1-\Pi _0)\varvec{\sigma }\Vert _{L^2(P)}. \end{aligned}$$
(5.26)

The upper bound \(\Vert \mathbf{A} \Vert _\infty \) and \(\Vert \mathbf{b} \Vert _\infty \) for the coefficients and the triangle inequality lead to

$$\begin{aligned}&\Lambda _P-\Vert (1-\Pi _0)\varvec{\sigma }\Vert _{L^2(P)}\le (\Vert \mathbf{A} \Vert _\infty +\Vert \mathbf{b} \Vert _\infty )\Vert \Pi _1u_h-u\Vert _{1,P}\nonumber \\&\quad \le (\Vert \mathbf{A} \Vert _\infty {+}\Vert \mathbf{b} \Vert _\infty )(\Vert u_h-\Pi _1u_h\Vert _{1,P}{+} \Vert u-u_h\Vert _{1,P}){\le } C_{10}(\zeta _P+\Vert u-u_h\Vert _{1,P}) \end{aligned}$$
(5.27)

with \(\Vert u_h-\Pi _1u_h\Vert _{1,P}\le (1+h_PC_\mathrm {PF})a_0^{-1/2}C_s^{1/2}\zeta _P\) from (4.24)–(4.25) and with \(C_{10}:=(\Vert \mathbf{A} \Vert _\infty +\Vert \mathbf{b} \Vert _\infty )((1+h_PC_\mathrm {PF})a_0^{-1/2}C_s^{1/2}+1)\). This followed by (5.3) concludes the proof of (5.5). \(\square \)

Recall the bubble-function \(b_{\mathcal {T}}|_P=b_P\) supported on a polygonal domain \(P\in \mathcal {T}\) from (3.17) as the sum of interior bubble-functions supported on each triangle \(T\in \widehat{{\mathcal {T}}}(P)\).

Proof of (5.4)

Rewrite the term

$$\begin{aligned} f-\gamma \Pi _1u_h = \Pi _1(f-\gamma \Pi _1u_h)+(1-\Pi _1)(f-\gamma \Pi _1u_h)=:R+\theta , \end{aligned}$$
(5.28)

and denote \(R_P:=R|_P\) and \(\theta _P:=\theta |_P\). The definition of \(B_\text {pw}(u-\Pi _1u_h,v)\) and the weak formulation \(B(u,v)=(f,v)\) from (1.8) for any \(v\in V\) imply

$$\begin{aligned} B_{\text {pw}}(u-\Pi _1u_h,v)+(\varvec{\sigma }_h,\nabla v)_{L^2(\Omega )}&=(f-\gamma \Pi _1u_h,v)_{L^2(\Omega )}=(R+\theta ,v)_{L^2(\Omega )}. \end{aligned}$$
(5.29)

Since \(b_PR_P\) belongs to \(H^1_0(\Omega )\) (extended by zero outside P), \(v :=b_PR_P\in V\) is admissible in (5.29). An integration by parts proves that \((\Pi _0\varvec{\sigma }_h,\nabla (b_PR_P))_{L^2(P)}=0\). Therefore, (5.29) shows

$$\begin{aligned} (R_P,b_PR_P)_{L^2(P)}&=B^P(u-\Pi _1u_h,b_PR_P)-(\theta _P,b_PR_P)_{L^2(P)}\\&\quad +((1-\Pi _0)\varvec{\sigma }_h,\nabla (b_PR_P))_{L^2(P)}. \end{aligned}$$

The substitution of \(\chi =R_P=\Pi _1(f-\gamma \Pi _1u_h)|_P\in \mathcal {P}_1(P)\) in (3.20) and the previous identity with the boundedness of B and the Cauchy–Schwarz inequality lead to the first two estimates in

$$\begin{aligned}&C_b^{-1}\Vert R_P\Vert _{L^2(P)}^2\le (R_P,b_PR_P)_{L^2(P)}\\&\quad \le \Big (M_b|u{-}\Pi _1u_h|_{1,P}{+}\Vert (1-\Pi _0)\varvec{\sigma }_h\Vert _{L^2(P)}\Big )|b_PR_P|_{1,P}{+}\Vert \theta _P\Vert _{L^2(P)}\Vert b_PR_P\Vert _{L^2(P)}\\ {}&\quad \le C_b\Big (M_b|u-\Pi _1u_h|_{1,P}+\Lambda _P+h_P\Vert \theta _P\Vert _{L^2(P)}\Big )h_P^{-1}\Vert R_P\Vert _{L^2(P)}. \end{aligned}$$

The last inequality follows from the definition of \(\Lambda _P\), and (3.21) with \(\chi = R_P\). This proves that \(C_b^{-2}h_P\Vert R_P\Vert _{L^2(P)}\le M_b|u-\Pi _1u_h|_{1,P}+\Lambda _P+h_P\Vert \theta _P\Vert _{L^2(P)}.\) Recall \(\eta _P\) from Sect. 5.1 and \(\eta _P = h_P\Vert f-\gamma \Pi _1u_h\Vert _{L^2(P)}\le h_P\Vert R_P\Vert _{L^2(P)}+h_P\Vert \theta _P\Vert _{L^2(P)}\) from the split in (5.28) and the triangle inequality. This and the previous estimate of \(h_P\Vert R_P\Vert _{L^2(P)}\) show the first estimate in

$$\begin{aligned} \eta _P&\le C_b^2(M_b|u-\Pi _1u_h|_{1,P}+\Lambda _P)+ (C_b^2+1)h_P\Vert \theta _P\Vert _{L^2(P)}\\&\le (C_b^2{+}1)\Big (M_b|u{-}\Pi _1u_h|_{1,P}{+}\Lambda _P{+} h_P\Vert (f-\gamma \Pi _1u_h){-}\Pi _1(f-\gamma u)\Vert _{L^2(P)}\Big )\\ {}&\le (C_b^2+1)\Big ((M_b+h_P\Vert \gamma \Vert _\infty )\Vert u-\Pi _1u_h\Vert _{1,P}+\Lambda _P+\mathrm {osc}_1(f-\gamma u,P)\Big ). \end{aligned}$$

The second step results from the definition of \(\theta _P=(1-\Pi _1)(f-\gamma \Pi _1u_h)|_P\) in (5.28) followed by the \(L^2\) orthogonality of \(\Pi _1\), and the last step results from an elementary algebra with the triangle inequality and \(\mathrm {osc}_1(f-\gamma u,P)=h_P\Vert (1-\Pi _1)(f-\gamma u)\Vert _{L^2(P)}\) from Sect. 5.1. The triangle inequality for the term \(u-\Pi _1u_h\) and the estimate of \(\Vert u_h-\Pi _1u_h\Vert _{1,P}\) as in (5.27) lead to

$$\begin{aligned} C_{11}^{-1}\eta _P\le \Vert u-u_h\Vert _{1,P}+\zeta _P+\Lambda _P+\mathrm {osc}_1(f-\gamma u,P) \end{aligned}$$

with \(C_{11} := (C_b^{2}+1)(M_b+h_P\Vert \gamma \Vert _\infty )((1+h_PC_\mathrm {PF})a_0^{-1/2}C_s^{1/2})+1)\). The combination of (5.3) and (5.5) in the last displayed estimate concludes the proof of (5.4). \(\square \)

Proof of (5.6)

Recall for \(u\in H^1_0(\Omega )\) and \(u_h\in V_h\) that and are well defined for all edges \(E\in \mathcal {E}\), and so the constant is uniquely defined as well. Since the jump of \(u-\alpha _E\) across any edge \(E\in \mathcal {E}\) vanishes, \([\Pi _1u_h]_E = [\Pi _1u_h-u+\alpha _E]_E\). Recall \(\omega _E=\text {int}(P^+\cup P^-)\) for \(E\in \mathcal {E}(\Omega )\) and \(\omega _E=\text {int}(P)\) for \(E\in \mathcal {E}(\partial \Omega )\) from Sect. 5.1. The trace inequality \(\Vert v\Vert ^2_{L^2(E)}\le C_T(|E|^{-1}\Vert v\Vert ^2_{L^2(\omega _E)}+|E|\;\Vert \nabla v\Vert ^2_{L^2(\omega _E)})\) (cf. [13, p. 554]) leads to

$$\begin{aligned}&|E|^{-1/2}\Vert [\Pi _1u_h]_E\Vert _{L^2(E)}\\&\quad \le C_T \left( |E|^{-1}\Vert \Pi _1u_h-u+\alpha _E\Vert _{L^2(\omega _E)}+\Vert \nabla _\text {pw}(\Pi _1u_h-u)\Vert _{L^2(\omega _E)}\right) . \end{aligned}$$

This and the triangle inequality show the first estimate in

$$\begin{aligned}&|E|^{-1/2}\Vert [\Pi _1u_h]_E\Vert _{L^2(E)} \nonumber \\&\le C_T\Big ( |E|^{-1}(\Vert u_h-\Pi _1u_h\Vert _{L^2(\omega _E)}+\Vert u_h-u+\alpha _E\Vert _{L^2(\omega _E)})\nonumber \\&\qquad +\Vert \nabla _\text {pw}(u_h-\Pi _1u_h)\Vert _{L^2(\omega _E)}+\Vert \nabla _\text {pw}(u-u_h)\Vert _{L^2(\omega _E)}\Big ). \end{aligned}$$
(5.30)

The estimates (4.24)–(4.25) control the term \(\Vert u_h-\Pi _1u_h\Vert _{1,P}\) as in (5.27), and the Poincaré–Friedrichs inequality from Lemma 2.1.b for \(u_h-u+\alpha _E\) with \(\int _{E}(u_h-u+\alpha _E)\,ds = 0\) (by the definition of \(\alpha _E\)) implies that \(\Vert u_h-u+\alpha _E\Vert _{L^2(P)}\le C_\mathrm {PF}h_P|u_h-u|_{1,P}\). This with the mesh assumption \(h_P\le \rho ^{-1}|E|\) and (5.30) result in

$$\begin{aligned} |E|^{-1/2}\Vert [\Pi _1u_h]_E\Vert _{L^2(E)}&\le C_T((C_\mathrm {PF}\rho ^{-1}+1)a_0^{-1/2}C_s^{1/2}+C_\mathrm {PF}+1)\\&\quad \sum _{P'\in \omega _E}(\Lambda _{P'}+|u-u_h|_{1,P'}). \end{aligned}$$

Since this holds for any edge \(E\in \mathcal {E}(P)\), the sum over all these edges and the bound (5.3) in the above estimate conclude the proof of (5.6). \(\square \)

Remark 9

(convergence rates of \(L^2\) error control for \(0<\sigma \le 1\)) The efficiency estimates (5.4)–(5.6) with a multiplication of \(h_P^{2\sigma }\) show that the local quantity \(h_P^{2\sigma }(\eta _P^2+\Lambda _P^2+\Xi _P^2)\) converges to zero with the expected convergence rate.

Remark 10

(efficiency up to stabilisation and oscillation for \(L^2\) error control when \(\sigma =1\)) For convex domains and \(\sigma =1\), there is even a local efficiency result that is briefly described in the sequel: The arguments in the above proof of (5.4)–(5.5) lead to

$$\begin{aligned} h_P^{2}\eta _P^2&\lesssim \Vert u-u_h\Vert ^2_{L^2(P)}+h_P^{2}(\zeta _P^2+\mathrm {osc}_1^2(f-\gamma u,P)+\Vert (1-\Pi _0)\varvec{\sigma }\Vert _{L^2(P)}^2),\\ h_P^{2}\Lambda _P^2&\lesssim \Vert u-u_h\Vert ^2_{L^2(P)}+h_P^{2}(\zeta _P^2+\Vert \mathbf{A} -\Pi _0\mathbf{A} \Vert _{L^\infty (P)}^2\Vert f\Vert ^2_{L^2(\Omega )}+\Vert (1-\Pi _0)\mathbf{b} u\Vert _{L^2(P)}^2). \end{aligned}$$

The observation \([\Pi _1u_h]_E=[\Pi _1u_h-u]_E\) for the term \(\Xi _P\), the trace inequality, and the triangle inequality show, for any \(E\in \mathcal {E}\), that

$$\begin{aligned} |E|^{1/2}\Vert [\Pi _1u_h]_E\Vert _{L^2(E)}&\le C_T\left( \Vert u_h-\Pi _1u_h\Vert _{L^2(\omega _E)}+\Vert u-u_h\Vert _{L^2(\omega _E)}\right. \\&\quad \left. +|E|(\Vert \nabla \Pi _1(u-u_h)\Vert _{L^2(\omega _E)}+\Vert \nabla (u-\Pi _1u)\Vert _{L^2(\omega _E)})\right) . \end{aligned}$$

The bound (4.25) for the first term and the inverse estimate \(\Vert \nabla \chi \Vert _{L^2(P)}\le C_{\text {inv}}h_P^{-1}\Vert \chi \Vert _{L^2(P)}\) for \(\chi \in \mathcal {P}_k(P)\) for the third term result in

$$\begin{aligned} |E|^{1/2}\Vert [\Pi _1u_h]_E\Vert _{L^2(E)}\lesssim \Vert u{-}u_h\Vert _{L^2(\omega _E)}{+}|E|\sum _{P'\in \omega _E}\Big (\Vert \nabla (1{-}\Pi _1)u\Vert _{L^2(P')}+\Lambda _{P'}\Big ). \end{aligned}$$

The mesh assumption (M2) implies that \(h_P^2\Xi _P^2 \le \rho ^{-1}\sum _{E\in \mathcal {E}(P)}|E|\;\Vert [\Pi _1u_h]_E\Vert _{L^2(E)}^2\). This and the above displayed inequality prove the efficiency estimate for \(h_P^2\Xi _P^2\).

6 Numerical experiments

This section manifests the performance of the a posteriori error estimator and an associated adaptive mesh-refining algorithm with D\(\ddot{o}\)rfler marking [37]. The numerical results investigate three computational benchmarks for the indefinite problem (1.1).

6.1 Adaptive algorithm

Input: initial partition \({\mathcal {T}}_0\) of \(\Omega \).

For \(\ell = 0,1,2,\dots \) do

  1. 1.

    SOLVE. Compute the discrete solution \(u_h\) to (3.8) with respect to \(\mathcal {T}_\ell \) for \(\ell =0,1,2\dots \) (cf. [5] for more details on the implementation).

  2. 2.

    ESTIMATE. Compute all the four terms \(\eta _\ell :=\eta _{\mathcal {T}_\ell }, \zeta _\ell :=\zeta _{\mathcal {T}_\ell }, \Lambda _\ell :=\Lambda _{\mathcal {T}_\ell }\) and \(\Xi _\ell :=\Xi _{\mathcal {T}_\ell }\), which add up to the upper bound (5.1).

  3. 3.

    MARK. Mark the polygons P in a subset \({\mathcal {M}}_\ell \subset \) \({\mathcal {T}}_\ell \) with minimal cardinality and

    $$\begin{aligned} {H1\mu }_{\ell }^2:=H1\mu ^2({\mathcal {T}}_\ell ):=\eta _\ell ^2+\zeta _\ell ^2+\Lambda _\ell ^2+\Xi _\ell ^2\le 0.5\sum _{P\in {\mathcal {M}}_\ell }(\eta _P^2+\zeta _P^2+\Lambda _P^2+\Xi _P^2). \end{aligned}$$
  4. 4.

    REFINE - Refine the marked polygon domains by connecting the mid-point of the edges to the centroid of respective polygon domains and update \({\mathcal {T}}_\ell \). (cf. Fig. 3 for an illustration of the refinement strategy.)

Fig. 3
figure 3

Refinement of a polygon into quadrilaterals

end do

Output: The sequences \(\mathcal {T}_\ell \), and the bounds \(\eta _\ell , \zeta _\ell , \Lambda _\ell , \Xi _\ell \), and \(H1\mu _\ell \) for \(\ell =0,1,2,\dots \).

The adaptive algorithm is displayed for mesh adaption in the energy error \(H^1\). Replace estimator \(H1\mu _\ell \) in the algorithm by \(L2\mu _\ell \) (the upper bound in (5.2)) for local mesh-refinement in the \(L^2\) error. Both uniform and adaptive mesh-refinement run to compare the empirical convergence rates and provide numerical evidence for the superiority of adaptive mesh-refinement. Note that uniform refinement means all the polygonal domains are refined. In all examples below, \(\overline{\mathbf{A }}_P=1\) in (3.6). The numerical realizations are based on a MATLAB implementation explained in [35] with a Gauss-like cubature formula over polygons. The cubature formula is exact for all bivariate polynomials of degree at most \(2n-1\), so the choice \(n\ge (k+1)/2\) leads to integrate a polynomial of degree k exactly. The quadrature errors in the computation of examples presented below appear negligible for the input parameter \(n=5\).

6.2 Square domain (smooth solution)

This subsection discusses the problem (1.1) with the coefficients \(\mathbf{A} =I, \mathbf{b} =(x,y)\) and \(\gamma =x^2+y^3\) on a square domain \(\Omega =(0,1)^2\), and the exact solution

$$\begin{aligned} u=16x(1-x)y(1-y)\arctan (25x-100y+50) \end{aligned}$$

with \(f={\mathcal {L}}u\). Since \(\gamma -\frac{1}{2}\text {div}(\mathbf{b} )=x^2+y^3-1\) is not always positive on \(\Omega \), this is an indefinite problem. Initially, the error and the estimators are large because of an internal layer around the line \(25x-100y+50=0\) with large first derivative of u resolved after few refinements as displayed in Fig. 4-5.

Fig. 4
figure 4

Output \(\mathcal {T}_1, \mathcal {T}_8, \mathcal {T}_{15}\) of the adaptive algorithm

Fig. 5
figure 5

Convergence history plot of estimator \(\mu \) and error \(e:=u-\Pi _1u_h\) in the a piecewise \(H^1\) norm, b \(L^2\) norm versus number ndof of degrees of freedom for both uniform and adaptive refinement

6.3 L-shaped domain (non-smooth solution)

This subsection shows an advantage of using adaptive mesh-refinement over uniform meshing for the problem (1.1) with the coefficients as \( \mathbf{A} =I, \mathbf{b} =(x,y)\text {and} \gamma =-4\) on a L-shaped domain \(\Omega =(-1,1)^2\backslash [0 , 1)\times (-1 , 0]\) and the exact solution

$$\begin{aligned} u=r^{2/3}\sin \left( \frac{2\theta }{3}\right) \end{aligned}$$

with \(f:={\mathcal {L}}u\). Since the exact solution is not zero along the boundary \(\partial \Omega \), the error estimators are modified according to Sect. 5.3.3.4. Since \(\gamma -\frac{1}{2}\text {div}(\mathbf{b} )=-5<0\), the problem is non-coercive. Observe that with increase in number of iterations, refinement is more at the singularity as highlighted in Fig. 6. Since the exact solution u is in \(H^{(5/3)-\epsilon }(\Omega )\) for all \(\epsilon >0\), from a priori error estimates the expected order of convergence in \(H^1\) norm is 1/3 and in \(L^2\) norm is at least 2/3 with respect to number of degrees of freedom for uniform refinement. Figure 7 shows that uniform refinement gives the sub-optimal convergence rate, whereas adaptive refinement lead to optimal convergence rates (1/2 for \(H^1\) norm and 5/6 in \(L^2\) norm).

Fig. 6
figure 6

Output \(\mathcal {T}_1, \mathcal {T}_{10}, \mathcal {T}_{15}\) of the adaptive refinement

Fig. 7
figure 7

Convergence history plot of estimator \(\mu \) and error \(e:=u-\Pi _1u_h\) in the a piecewise \(H^1\) norm, b \(L^2\) norm vs number ndof of degrees of freedom for both uniform and adaptive refinement

6.4 Helmholtz equation

This subsection considers the exact solution \(u=1+\tanh (-9(x^2+y^2-0.25))\) to the problem

$$\begin{aligned} -\Delta u-9 u=f\quad \quad \text {in}\quad \Omega =(-1,1)^2. \end{aligned}$$

There is an internal layer around the circle centered at (0, 0) and of radius 0.25 where the second derivatives of u are large because of steep increase in the solution resulting in the large error at the beginning, and this gets resolved with refinement as displayed in Fig. 8-9.

Fig. 8
figure 8

Output \(\mathcal {T}_1, \mathcal {T}_5, \mathcal {T}_{11}\) of the adaptive refinement

Fig. 9
figure 9

Convergence history plot of estimator \(\mu \) and error \(e:=u-\Pi _1u_h\) in the a piecewise \(H^1\) norm, b \(L^2\) norm vs number ndof of degrees of freedom for both uniform and adaptive refinement

6.5 Conclusion

The three computational benchmarks provide empirical evidence for the sharpness of the mathematical a priori and a posteriori error analysis in this paper and illustrate the superiority of adaptive over uniform mesh-refining. The empirical convergence rates in all examples for the \(H^1\) and \(L^2\) errors coincide with the predicted convergence rates in Theorem 4.3, in particular, for the non-convex domain and reduced elliptic regularity. The a posteriori error bounds from Theorem 5.1 confirm these convergence rates as well. The ratio of the error estimator \(\mu _\ell \) by the \(H^1\) error \(e_\ell \), sometimes called efficiency index, remains bounded up to a typical value 6; we regard this as a typical overestimation factor for the residual-based a posteriori error estimate. Recall that the constant \(C_{\text {reg}}\) has not been displayed so the error estimator \(\mu _\ell \) does not provide a guaranteed error bound. Figures 10 and 11 display the four different contributions volume residual \((\sum _P\eta _P^2)^{1/2}\), stabilization term \((\sum _P\zeta _P^2)^{1/2}\), inconsistency term \((\sum _P\Lambda _P^2)^{1/2}\) and the nonconformity term \((\sum _P\Xi _P^2)^{1/2}\) that add up to the error estimator \(\mu _\ell \). We clearly see that all four terms converge with the overall rates that proves that none of them is a higher-order term and makes it doubtful that some of those terms can be neglected. The volume residual clearly dominates the a posteriori error estimates, while the stabilisation term remains significantly smaller for the natural stabilisation (with undisplayed parameter one). The proposed adaptive mesh-refining algorithm leads to superior convergence properties and recovers the optimal convergence rates. This holds for the first example with optimal convergence rates in the large pre-asymptotic computational range as well as in the second with suboptimal convergence rates under uniform mesh-refining according to the typical corner singularity and optimal convergence rates for the adaptive mesh-refining. The third example with the Helmholtz equation and a moderate wave number shows certain moderate local mesh-refining in Fig. 8 but no large improvement over the optimal convergence rates for uniform mesh-refining. The adaptive refinement generates hanging nodes because of the way refinement strategy is defined, but this is not troublesome in VEM setting as hanging node can be treated as a just another vertex in the decompostion of domain. However, an increasing number of hanging nodes with further mesh refinements may violate the mesh assumption (M2), but numerically the method seems robust without putting any restriction on the number of hanging nodes. The future work on the theoretical investigation of the performance of adaptive mesh-refining algorithm is clearly motivated by the successful numerical experiments. The aforementioned empirical observation that the stabilisation terms do not dominate the a posteriori error estimates raises the hope for a possible convergence analysis of the adaptive mesh-refining strategy with the axioms of adaptivity [20] towards a proof of optimal convergence rates: The numerical results in this section support this conjecture at least for the lowest-order VEM in 2D for indefinite non-symmetric second-order elliptic PDEs.

Fig. 10
figure 10

Estimator components corresponding to the error \(H1e=|u-\Pi _1u_h|_{1,\text {pw}}\) of the adaptive refinement presented in Subsection 6.26.4

Fig. 11
figure 11

Estimator components corresponding to the error \(L2e=\Vert u-\Pi _1u_h\Vert _{L^2(\Omega )}\) of the adaptive refinement presented in Subsection 6.26.4