1 Introduction

Consider the singularly perturbed reaction diffusion problem, given in \(\varOmega =(0,1)^2\) by

$$\begin{aligned} -\varepsilon ^2\varDelta u+cu=f, \end{aligned}$$
(1)

where \(0<\varepsilon \ll 1\), \(c\in W^{1,\infty }\), \(c_\infty \ge c\ge c_0>0\) and \(u=0\) on \(\partial \varOmega \). We rewrite the problem, using \(\varvec{u}=-\varepsilon {{\,\mathrm{{grad}}\,}}^\circ u\), into a first order system

$$\begin{aligned} \left[ \begin{pmatrix}c&{}0\\ 0&{}1 \end{pmatrix}+\begin{pmatrix}0&{}\varepsilon {{\,\mathrm{{div}}\,}}\\ \varepsilon {{\,\mathrm{{grad}}\,}}^\circ &{}0 \end{pmatrix}\right] \begin{pmatrix}u\\ \varvec{u} \end{pmatrix}=\begin{pmatrix}f\\ 0 \end{pmatrix}, \end{aligned}$$
(2)

where \({{\,\mathrm{{grad}}\,}}^\circ \) denotes the gradient in \(H^1_0(\varOmega )\) and \({{\,\mathrm{{div}}\,}}\) its adjoint, the divergence. This formulation is also called a mixed formulation. For its weak formulation let \(\left\langle \cdot ,\cdot \right\rangle \) denote the \(L^2\)-scalar product over \(\varOmega \). Then (2) becomes with \(V=(v,\varvec{v})\in L^2(\varOmega )\times H_{{{\,\mathrm{{div}}\,}}}(\varOmega )\)

$$\begin{aligned} \left\langle cu,v \right\rangle +\varepsilon \left\langle {{\,\mathrm{{div}}\,}}\varvec{u},v \right\rangle +\left\langle \varvec{u},\varvec{v} \right\rangle +\varepsilon \left\langle {{\,\mathrm{{grad}}\,}}^\circ u,\varvec{v} \right\rangle =\left\langle f,v \right\rangle \end{aligned}$$

which can also be written for \(U=(u,\varvec{u})\in L^2(\varOmega )\times H_{{{\,\mathrm{{div}}\,}}}(\varOmega )\) as

$$\begin{aligned} {B(U,V):=}\left\langle cu,v \right\rangle +\varepsilon \left\langle {{\,\mathrm{{div}}\,}}\varvec{u},v \right\rangle +\left\langle \varvec{u},\varvec{v} \right\rangle -\varepsilon \left\langle u,{{\,\mathrm{{div}}\,}}\varvec{v} \right\rangle =\left\langle f,v \right\rangle . \end{aligned}$$
(3)

This is the weak form we will discretise and analyse.

Singularly perturbed reaction diffusion problems were analysed in many papers, see e.g. [2, 18]. The associated norm to (1) is the \(\varepsilon \)-weighted \(H^1\)-norm, also called energy norm. Unfortunately, that norm is not strong enough to see the boundary layers. For example, the corresponding layer function for the boundary \(x=0\) is of the type \(\mathrm {e}^{-x/\varepsilon }\). Here it holds

$$\begin{aligned} \Vert {\mathrm {e}^{-x/\varepsilon }}\Vert _{L^2(\varOmega )}+\varepsilon \Vert {{{\,\mathrm{{grad}}\,}}\mathrm {e}^{-x/\varepsilon }}\Vert _{L^2(\varOmega )} \lesssim \varepsilon ^{1/2} {\mathop {\longrightarrow }\limits ^{\varepsilon \rightarrow 0}}0. \end{aligned}$$

Therefore over the last years convergence in a balanced norm, where the boundary layers do not vanish for \(\varepsilon \rightarrow 0\), was considered, see [1, 9, 11, 17]. For the lowest order Raviart-Thomas elements on a Shishkin mesh the system (3) was also considered in [15, Section 5] and analysed in a balanced \(H^1\)-comparable norm.

In this paper we prove optimal convergence orders in a stronger balanced \(H^2\)-comparable norm

$$\begin{aligned} \left| \!\!\;\left| \!\!\;\left| {U} \right| \!\!\;\right| \!\!\;\right| _{bal}^2\sim \Vert {u}\Vert _{L^2(\varOmega )}^2+\varepsilon ^{-1}\Vert {\varvec{u}}\Vert _{L^2(\varOmega )}^2+\varepsilon \Vert {{{\,\mathrm{{div}}\,}}\varvec{u}}\Vert _{L^2(\varOmega )}^2 \end{aligned}$$

for a variety of \(H_{{{\,\mathrm{{div}}\,}}}\)-conforming elements on general layer-adapted meshes. The paper is organised as follows. In Sect. 2 we define the numerical method and recall results for a solution decomposition and interpolation errors. In Sect. 3 we provide the convergence analysis and in the final Sect. 4 some numerical examples illustrating our theoretical results are given.

1.1 Notation

We denote vector valued functions with a bold font. \(L^p(D)\) with the norm \(\Vert {\cdot }\Vert _{L^p(D)}\) is the classical Lebesgue space of function integrable to the power p over a domain \(D\subset {\mathbb {R}}^2\) and \(W^{\ell ,p}(D)\) the corresponding Sobolev space for derivatives up to order \(\ell \). Furthermore, we write \(A\lesssim B\) if there exists a generic constant \(C>0\) such that \(A\le C\cdot B\).

2 Numerical Method and Interpolation Errors

In order to define our numerical method, we need discrete spaces defined over an appropriate mesh. A basic tool for defining this mesh is the knowledge of a solution decomposition, especially the structure of layers.

Assumption 1

The solution u of (1) can be written as

$$\begin{aligned} u=s+w_1+w_2+w_3+w_4+w_{12}+w_{23}+w_{34}+w_{41} \end{aligned}$$

where s is the smooth part, \(w_i\) are boundary layers and \(w_{ij}\) are corner layers (both counted counterclockwise). To be more precise, for any given degree k it holds for \(0\le i,j\le k+2\),

$$\begin{aligned} \Vert {\partial _x^i\partial _y^j s}\Vert _{L^\infty (\varOmega )}&\lesssim 1,&|\partial _x^i\partial _y^jw_1(x,y)|&\lesssim \varepsilon ^{-i}\mathrm {e}^{-x/\varepsilon },&|\partial _x^i\partial _y^j w_{12}(x,y)|&\lesssim \varepsilon ^{-(i+j)}\mathrm {e}^{-(x+y)/\varepsilon }, \end{aligned}$$

and analogously for the other boundary layers and corner layers.

Remark 2

Using above solution decomposition for u we derive a similar decomposition for the solution U of (3), as \(U=(u,\varvec{u})\) and \(\varvec{u}=-\varepsilon {{\,\mathrm{{grad}}\,}}u\).

Such assumptions on a solution decomposition are very common in the analysis of singularly perturbed problems. They hold true under compatibility and regularity conditions on the data, see e.g. [6, 10, 13].

We follow [16] and construct an S-type mesh using the information of the solution decomposition. First, we define a transition point \(\lambda \), such that a typical boundary layer function is small enough:

$$\begin{aligned} \exp (-\lambda /\varepsilon ) = N^{-\sigma } \quad \Rightarrow \quad \lambda = \sigma \varepsilon \ln (N), \end{aligned}$$

for a constant \(\sigma >0\) specified later. We additionally assume

$$\begin{aligned} \lambda = \min \left\{ \sigma \varepsilon \ln (N),\frac{1}{4}\right\} , \end{aligned}$$

as otherwise \(\varepsilon \) is large enough to facilitate a standard numerical analysis. Now \(0=x_0<x_1<\dots <x_N=1\) are given by

$$\begin{aligned} x_i:={\left\{ \begin{array}{ll} \sigma \varepsilon \phi \left( \frac{2i}{N}\right) , &{}i=0,\dots ,N/4,\\ \frac{2i}{N}(1-2\lambda )-\frac{1}{2}+2\lambda , &{}i=N/4,\dots ,3N/4,\\ 1-\sigma \varepsilon \phi \left( 2-\frac{2i}{N}\right) , &{}i=3N/4,\dots ,N,\\ \end{array}\right. } \end{aligned}$$

where \(\phi \) is a mesh-generating function with the properties

  • \(\phi \) is monotonically increasing,

  • \(\phi (0)=0\) and \(\phi (1/2)=\ln N\),

  • \(\phi \) is piecewise differentiable with \(\max \phi '\le C N\) and

  • \(\min \limits _{i=1,\dots ,N/4}\left( \phi \left( \frac{2i}{N}\right) -\phi \left( \frac{2(i-1)}{N}\right) \right) \ge C N^{-1}\).

The first three conditions are given in [16], while the last one allows the mesh-widths inside the boundary layers to be bounded from below, see also [8]. Related to \(\phi \) we define the mesh characterising function \(\psi \) by

$$\begin{aligned} \psi =\mathrm {e}^{-\phi }. \end{aligned}$$

Several S-type meshes are given in [16] fulfilling above properties. We only provide the definitions of the two mostly used. For the Shishkin mesh we have

$$\begin{aligned} \phi (t) = 2t\ln N,\quad \psi (t) = N^{-2t},\quad \max |\psi '| =2\ln N \end{aligned}$$

and the Bakhvalov-S-mesh

$$\begin{aligned} \phi (t)=-\ln (1-2t(1-N^{-1})),\quad \psi (t)=1-2t(1-N^{-1}),\quad \max |\psi '|= 2. \end{aligned}$$

In addition to the mesh generating and characterising functions also \(\max |\psi '|:=\max \limits _{t\in [0,1/2]}|\psi '(t)|\) is given, that enters all the error estimates on S-type meshes.

The two-dimensional mesh \(T_N\) is then defined by all cells \(K_{ij}:=(x_{i-1},x_i)\times (x_{j-1},x_j)\) for \(1\le i,j\le N\). Note that it holds

$$\begin{aligned} h_i:=x_i-x_{i-1} \lesssim {\left\{ \begin{array}{ll} \varepsilon N^{-1}\max |\psi '|e^{x/(\sigma \varepsilon )},&{}i\le N/4\text { or }i>3N/4,\text { and }x\in [x_{i-1},x_i],\\ N^{-1},&{} \text {otherwise}, \end{array}\right. }\nonumber \\ \end{aligned}$$
(4)

where the first estimate can e.g. be found in [12, Lemma 2.3]. It also follows the simpler bound

$$\begin{aligned} h:=\max _{i=1,\dots ,N/4}h_i \lesssim \varepsilon . \end{aligned}$$

Let us denote two subdomains of \(\varOmega \) per layer function, exemplarily given for \(w_1\) by

$$\begin{aligned} \varOmega _1:=[0,\lambda ]\times [0,1]\quad \text{ and } \quad \varOmega _1^*:=[0,x_{N/4-1}]\times [0,1]\subset \varOmega _1 \end{aligned}$$

and for \(w_{12}\) by

$$\begin{aligned} \varOmega _{12}:=[0,\lambda ]^2\quad \text{ and } \quad \varOmega _{12}^*:=[0,x_{N/4-1}]^2\subset \varOmega _{12}. \end{aligned}$$

With (3) only needing \(L^2\)-regularity for the first component and \(H_{{{\,\mathrm{{div}}\,}}}\)-regularity for the second component, our discrete spaces are

$$\begin{aligned} {\mathcal {U}}_N:=\{(u_N,\varvec{u}_N)\subset L^2(\varOmega )\times H_{{{\,\mathrm{{div}}\,}}}(\varOmega ): \forall K\subset T_N: u_N|_K\in {\mathcal {Q}}_k(K),\,\varvec{u}_N|_K\in {\mathcal {D}}_k(K)\}, \end{aligned}$$

where \({\mathcal {Q}}_k(K)\) is the space of polynomials with degree up to k in each variable on the cell K of \(T_N\). For the discretisation of \(H_{{{\,\mathrm{{div}}\,}}}\) with \({\mathcal {D}}_k(K)\) we can use

  • the Raviart-Thomas space

    $$\begin{aligned} RT_k(K)={\mathcal {Q}}_{k+1,k}(K)\times {\mathcal {Q}}_{k,k+1}(K), \end{aligned}$$

    introduced by Raviart and Thomas in [14] on triangular meshes, see also [4] for rectangular meshes, where \({\mathcal {Q}}_{p,q}(K)\) is the space of polynomials with degree p in x and degree q in y on the cell K or

  • the Brezzi-Douglas-Marini space

    $$\begin{aligned} BDM_k( K):= ({\mathcal {P}}_k(K))^2\oplus \text {span}\{{{\,\mathrm{{curl}}\,}}(x^{k+1}y) ,{{\,\mathrm{{curl}}\,}}(xy^{k+1})\}, \end{aligned}$$

    see [5], where \({\mathcal {P}}_k(K)\) is the space of polynomials of total degree k on the cell K and \({{\,\mathrm{{curl}}\,}}w=(\partial _y w,-\partial _x w)\).

Then the discrete method reads: Find \(U_N=(u_N,\varvec{u}_N)\in {\mathcal {U}}_N\), s.t. for all \(V\in {\mathcal {U}}_N\) it holds

$$\begin{aligned} B(U_N,V)=\left\langle cu_N,v \right\rangle +\varepsilon \left\langle {{\,\mathrm{{div}}\,}}\varvec{u}_N,v \right\rangle +\left\langle \varvec{u}_N,\varvec{v} \right\rangle -\varepsilon \left\langle u_N,{{\,\mathrm{{div}}\,}}\varvec{v} \right\rangle =\left\langle f,v \right\rangle . \end{aligned}$$
(5)

Note that the solution U of (3) does also fulfill (5) and we therefore have Galerkin orthogonality

$$\begin{aligned} B(U_N-U,V)=0,\quad \forall \, V\in {\mathcal {U}}_N. \end{aligned}$$
(6)

3 Numerical Analysis

Let us start with an interpolation operator into \({\mathcal {U}}_N\) given by its two components. The first one \({\mathcal {I}}_1\) will be a weighted local \(L^2\)-projection, defined on any \(K\subset T_N\) by

$$\begin{aligned} \left\langle c({\mathcal {I}}_1 u-u),{w} \right\rangle _K=0\text { for all } {w}\in {\mathcal {Q}}_k(K). \end{aligned}$$

This weighted \(L^2\)-projection is \(L^2\)-stable due to \(0<c_0\le c\le c_\infty \) and

$$\begin{aligned} \Vert {c^{1/2}{\mathcal {I}}_1 v}\Vert _{L^2(K)}^2&= \left\langle c{\mathcal {I}}_1 v,{\mathcal {I}}_1 v \right\rangle _K = \left\langle c v,{\mathcal {I}}_1,v \right\rangle _K \le \Vert {c^{1/2}v}\Vert _{L^2(K)}\Vert {c^{1/2}{\mathcal {I}}_1 v}\Vert _{L^2(K)}\\ \quad \Rightarrow \quad \Vert {{\mathcal {I}}_1 v}\Vert _{L^2(K)}&\le \frac{1}{c_0^{1/2}}\Vert {c^{1/2}{\mathcal {I}}_1 v}\Vert _{L^2(K)} \le \frac{1}{c_0^{1/2}}\Vert {c^{1/2} v}\Vert _{L^2(K)} \le \left( \frac{c_\infty }{c_0} \right) ^{1/2}\Vert {v}\Vert _{L^2(K)} \end{aligned}$$

for all \(v\in L^2(K)\). Moreover, following [17] and denoting by \({\mathcal {L}}_1\) a pointwise Lagrange interpolation operator into \({\mathcal {Q}}_k(K)\) we obtain

$$\begin{aligned} \Vert {{\mathcal {I}}_1 v-v}\Vert _{L^2(K)}&\le \Vert {{\mathcal {I}}_1v - {\mathcal {I}}_1 {\mathcal {L}}_1 v}\Vert _{L^2(K)} + \Vert {{\mathcal {I}}_1 {\mathcal {L}}_1 v-v}\Vert _{L^2(K)}\\&= \Vert {{\mathcal {I}}_1(v - {\mathcal {L}}_1 v)}\Vert _{L^2(K)} + \Vert {{\mathcal {L}}_1 v-v}\Vert _{L^2(K)}\\&\le \left( \left( \frac{c_\infty }{c_0}\right) ^{1/2}+1 \right) \Vert {{\mathcal {L}}_1 v-v}\Vert _{L^2(K)}. \end{aligned}$$

By using the interpolation error results of [2] for the remaining error, the anisotropic interpolation error estimates follow, i.e. for any \(0\le \ell \le k+1\) it holds on a cell K with dimension \(h_x\times h_y\)

$$\begin{aligned} \Vert {{\mathcal {I}}_1v-v}\Vert _{L^2(K)}\lesssim h_x^\ell \Vert {\partial _x^\ell v}\Vert _{L^2(K)}+h_y^\ell \Vert {\partial _y^\ell v}\Vert _{L^2(K)} \end{aligned}$$
(7)

if \(v\in H^{\ell }(K)\).

The second operator \(\varvec{{\mathcal {I}}}_2\) utilises the classical interpolation operator \(\varvec{{\mathcal {J}}}\) on \({\mathcal {D}}_k\). It is defined on each cell K for

  • \({\mathcal {D}}_k(K)=RT_k(K)\) by

    $$\begin{aligned} \int _{F}(\varvec{{\mathcal {J}}}\varvec{v}-\varvec{v})\cdot \varvec{n}\cdot q&=0,\quad \forall q\in {\mathcal {P}}_k(F)\text { for all faces } F\subset \partial K, \end{aligned}$$
    (8a)
    $$\begin{aligned} \int _{K}(\varvec{{\mathcal {J}}}\varvec{v}-\varvec{v})\cdot \varvec{q}&=0,\quad \forall \varvec{q}\in {\mathcal {Q}}_{k-1,k}(K)\times {\mathcal {Q}}_{k,k-1}(K), \end{aligned}$$
    (8b)
  • \({\mathcal {D}}_k(K)=BDM_k(K)\) by

    $$\begin{aligned} \int _{F}(\varvec{{\mathcal {J}}}\varvec{v}-\varvec{v})\cdot \varvec{n}\cdot q&=0,\quad \forall q\in {\mathcal {P}}_{k}(F),\forall F\subset \partial K, \end{aligned}$$
    (9a)
    $$\begin{aligned} \int _{K}(\varvec{{\mathcal {J}}}\varvec{v}-\varvec{v})\cdot \varvec{q}&=0,\quad \forall \varvec{q}\in ({\mathcal {P}}_{k-2}(K))^2. \end{aligned}$$
    (9b)

It holds the anisotropic interpolation error estimates for \(\varvec{{\mathcal {J}}}\), see [7, 19],

$$\begin{aligned} \Vert {\varvec{{\mathcal {J}}}\varvec{v}-\varvec{v}}\Vert _{L^2(K)}\lesssim \sum _{s=0}^{k+1} h_x^{{k+1}-s}h_y^s\Vert {\partial _x^{k+1-s}\partial _y^s \varvec{v}}\Vert _{L^2(K)} \end{aligned}$$
(10)

if \(\varvec{v}\in H^{k+1}(K)\). Note that for \(RT_k\) an even sharper result involving only pure derivatives of \(\varvec{v}\) holds, see [7]. In addition, we have also anisotropic interpolation error estimates for the \(L^2\)-norm of the divergence, see [7],

$$\begin{aligned} RT_k:\quad \Vert {{{\,\mathrm{{div}}\,}}(\varvec{{\mathcal {J}}}\varvec{v}-\varvec{v})}\Vert _{L^2(K)}&\lesssim h_x^{k+1}\Vert {\partial _x^{k+1}{{\,\mathrm{{div}}\,}}\varvec{v}}\Vert _{L^2(K)}+h_y^{k+1}\Vert {\partial _y^{k+1}{{\,\mathrm{{div}}\,}}\varvec{v}}\Vert _{L^2(K)}, \end{aligned}$$
(11a)
$$\begin{aligned} BDM_k:\quad \Vert {{{\,\mathrm{{div}}\,}}(\varvec{{\mathcal {J}}}\varvec{v}-\varvec{v})}\Vert _{L^2(K)}&\lesssim \sum _{|\varvec{\alpha }|=k}h_x^{\alpha _1}h_y^{\alpha _2}\Vert {\partial _x^{\alpha _1}\partial _y^{\alpha _2}{{\,\mathrm{{div}}\,}}\varvec{v}}\Vert _{L^2(K)}, \end{aligned}$$
(11b)

if \(\varvec{v}\) is such that \({{\,\mathrm{{div}}\,}}\varvec{v}\in H^{k+1}(K)\) for \(RT_k\) and \({{\,\mathrm{{div}}\,}}\varvec{v}\in H^{k}(K)\) for \(BDM_k\).

If we only want to prove convergence in the \(L^2\)-norm of \(U=(u,-\varepsilon {{\,\mathrm{{grad}}\,}}u)\) the interpolation operator \(\varvec{{\mathcal {J}}}\) is enough. For a stronger convergence result we define a more sophisticated operator, following ideas from [11]. Recalling the decomposition of u, we have for \(\varvec{u}=-\varepsilon {{\,\mathrm{{grad}}\,}}u\) the decomposition

$$\begin{aligned} \varvec{u}=\varvec{s}+\varvec{w}^1+\varvec{w}^2+\varvec{w}^3+\varvec{w}^4+\varvec{w}^{12}+\varvec{w}^{23}+\varvec{w}^{34}+\varvec{w}^{41} \end{aligned}$$
(12)

with the obvious definition of the bold font letters. Now the operator \(\varvec{{\mathcal {I}}}_2\) is defined piecewise on each cell \(K\subset T_N\) with \(id\in \{1,2,3,4,12,23,34,41\}\)

$$\begin{aligned} \varvec{{\mathcal {I}}}_2 \varvec{s}|_K&:= \varvec{{\mathcal {J}}}\varvec{s}|_K,&\varvec{{\mathcal {I}}}_2 \varvec{w}^{id}|_K&:={\left\{ \begin{array}{ll} \varvec{{\mathcal {J}}}\varvec{w}^{id}|_K,&{} K\subset \varOmega _{id}^*,\\ {\hat{\varvec{{\mathcal {J}}}}}^{id} \varvec{w}^{id}|_K,&{}K\subset \varOmega _{id}{\setminus }\varOmega _{id}^*,\\ 0, &{} K\subset \varOmega {\setminus }\varOmega _{id}. \end{array}\right. } \end{aligned}$$

Using \(\varGamma _{id}:=\partial \varOmega _{id}{\setminus }\varGamma \) we define the remaining operators on each \(K\subset \varOmega _{id}{\setminus }\varOmega _{id}^*\) using the same definition as for \(\varvec{{\mathcal {J}}}\) with the exception of the first condition in (8) and (9). This one is replaced by the two conditions

$$\begin{aligned} \int _{F}({\hat{\varvec{{\mathcal {J}}}}}^{id} \varvec{w}^{id}-\varvec{w}^{id})\cdot \varvec{n}\cdot q&=0\text { for each face }F\subset \partial K{\setminus }\varGamma _{id},\,\forall q\in {\mathcal {P}}_k(F),\\ \int _{F}{\hat{\varvec{{\mathcal {J}}}}}^{id} \varvec{w}^{id}\cdot \varvec{n}\cdot q&=0\text { for each face }F\subset \partial K\cap \varGamma _{id},\,\forall q\in {\mathcal {P}}_k(F). \end{aligned}$$

For our analysis let us define a norm that is associated with \(B(\cdot ,\cdot )\). Here it holds

$$\begin{aligned} B(U,U)\ge \min \{1,c_0\}\Vert {U}\Vert _{L^2(\varOmega )}^2 \end{aligned}$$
(13)

that is equivalent to coercivity in the energy norm of the weak formulation of (1). But we can actually use the stronger norm

$$\begin{aligned} \left| \!\!\;\left| \!\!\;\left| {U} \right| \!\!\;\right| \!\!\;\right| :=\left( \Vert {U}\Vert _{L^2(\varOmega )}^2+\delta \Vert {\varepsilon {{\,\mathrm{{div}}\,}}\varvec{u}}\Vert _{L^2(\varOmega )}^2 \right) ^{1/2}, \end{aligned}$$

where \(\delta \le \frac{c_0}{c_\infty ^2}\). This norm is equivalent to the weighted \(H^2\)-norm \( \Vert {u}\Vert _{L^2(\varOmega )}+\varepsilon \Vert {{{\,\mathrm{{grad}}\,}}u}\Vert _{L^2(\varOmega )}+\varepsilon ^2\Vert {\varDelta u}\Vert _{L^2(\varOmega )}, \) which is stronger than the energy norm and, unfortunately, also not balanced. To repair this weakness we also introduce a balanced version of this norm

$$\begin{aligned} \left| \!\!\;\left| \!\!\;\left| {U} \right| \!\!\;\right| \!\!\;\right| _{bal}:=\left( \Vert {u}\Vert _{L^2(\varOmega )}^2+\varepsilon ^{-1}\Vert {\varvec{u}}\Vert _{L^2(\varOmega )}^2+\delta \varepsilon ^{-1}\Vert {\varepsilon {{\,\mathrm{{div}}\,}}\varvec{u}}\Vert _{L^2(\varOmega )}^2 \right) ^{1/2}, \end{aligned}$$

that is also considered in [11]. The remainder of this section is devoted to proving optimal uniform convergence orders in the balanced norm. Of course, convergence in the unbalanced norm then follows.

Lemma 3

For each \(\delta \le \frac{c_0}{c_\infty ^2}\) exists a constant \(\beta >0\), such that for all \(V\in {\mathcal {U}}_N\) it holds

$$\begin{aligned} \sup _{\chi \in {\mathcal {U}}_N}\frac{B(V,\chi )}{\Vert {\chi }\Vert _{L^2(\varOmega )}}\ge \beta \left| \!\!\;\left| \!\!\;\left| {V} \right| \!\!\;\right| \!\!\;\right| . \end{aligned}$$

Proof

By (13) we already have

$$\begin{aligned} B(V,V)\ge \min \{1,c_0\}\Vert {V}\Vert _{L^2(\varOmega )}^2. \end{aligned}$$

Choosing as test function \(\chi (V)=(v+\delta \varepsilon {{\,\mathrm{{div}}\,}}\varvec{v},\varvec{v})\in {\mathcal {U}}_N\), we obtain

$$\begin{aligned} B(V,\chi (V))&\ge \left( c_0-\delta \frac{c_\infty ^2}{2}\right) \Vert {v}\Vert _{L^2(\varOmega )}^2 +\Vert {\varvec{v}}\Vert _{L^2(\varOmega )}^2 +\frac{\delta }{2}\Vert {\varepsilon {{\,\mathrm{{div}}\,}}\varvec{v}}\Vert _{L^2(\varOmega )}^2 \end{aligned}$$

and together with \(\delta \le \frac{c_0}{c_\infty ^2}\) we have

$$\begin{aligned} B(V,\chi (V))\ge \frac{c_0}{2}\Vert {v}\Vert _{L^2(\varOmega )}^2 +\Vert {\varvec{v}}\Vert _{L^2(\varOmega )}^2 +\frac{\delta }{2}\Vert {\varepsilon {{\,\mathrm{{div}}\,}}\varvec{v}}\Vert _{L^2(\varOmega )}^2 \ge \frac{1}{2}\min \{c_0,2\}\left| \!\!\;\left| \!\!\;\left| {V} \right| \!\!\;\right| \!\!\;\right| ^2. \end{aligned}$$

In addition it holds

$$\begin{aligned} \Vert {\chi (V)}\Vert _{L^2(\varOmega )}^2&\le 2\Vert {V}\Vert _{L^2(\varOmega )}^2+2\delta ^2\Vert {\varepsilon {{\,\mathrm{{div}}\,}}\varvec{v}}\Vert _{L^2(\varOmega )}^2 \le 2\max \left\{ 1,\delta \right\} \left| \!\!\;\left| \!\!\;\left| {V} \right| \!\!\;\right| \!\!\;\right| ^2. \end{aligned}$$

Thus it follows

$$\begin{aligned} \sup _{\chi \in {\mathcal {U}}_N}\frac{B(V,\chi )}{\Vert {\chi }\Vert _{L^2(\varOmega )}} \ge \frac{B(V,\chi (V))}{\Vert {\chi (V)}\Vert _{L^2(\varOmega )}} \ge \frac{\sqrt{2}}{4}\frac{\min \{2,c_0\}}{\max \{1,\sqrt{\delta }\}}\left| \!\!\;\left| \!\!\;\left| {V} \right| \!\!\;\right| \!\!\;\right| . \end{aligned}$$

Setting \(\beta =\frac{\sqrt{2}}{4}\frac{\min \{2,c_0\}}{\max \{1,\sqrt{\delta }\}} \ge \frac{\sqrt{2}}{4}\frac{\min \{2,c_0\}}{\max \{1,\frac{\sqrt{c_0}}{c_\infty }\}} >0\) proves the assertion. \(\square \)

Let us split the error \(U-U_N\) into an interpolation error and a discrete error

$$\begin{aligned} U-U_N = U - {\mathcal {I}}U+{\mathcal {I}}U-U_N=:(\eta ,\varvec{\eta })-(\xi ,\varvec{\xi }),\,(\xi ,\varvec{\xi })\in {\mathcal {U}}_N. \end{aligned}$$

Using above inf-sup inequality and the Galerkin orthogonality (6) we arrive at

$$\begin{aligned} \beta \left| \!\!\;\left| \!\!\;\left| {(\xi ,\varvec{\xi })} \right| \!\!\;\right| \!\!\;\right| \le \sup _{V\in {\mathcal {U}}_N}\frac{B((\xi ,\varvec{\xi }),V)}{\Vert {V}\Vert _{L^2(\varOmega )}} = \sup _{V\in {\mathcal {U}}_N}\frac{B((\eta ,\varvec{\eta }),V)}{\Vert {V}\Vert _{L^2(\varOmega )}}, \end{aligned}$$
(14)

and we are left with estimating \(B((\eta ,\varvec{\eta }),V)\) for any \(V\in {\mathcal {U}}_N\). Here it holds using (5)

$$\begin{aligned} B((\eta ,\varvec{\eta }),V)&= \left\langle c\eta ,v \right\rangle +\varepsilon \left\langle {{\,\mathrm{{div}}\,}}\varvec{\eta },v \right\rangle +\left\langle \varvec{\eta },\varvec{v} \right\rangle -\varepsilon \left\langle \eta ,{{\,\mathrm{{div}}\,}}\varvec{v} \right\rangle \\&= \varepsilon \left\langle {{\,\mathrm{{div}}\,}}\varvec{\eta },v \right\rangle +\left\langle \varvec{\eta },\varvec{v} \right\rangle -\varepsilon \left\langle \eta ,{{\,\mathrm{{div}}\,}}\varvec{v} \right\rangle \end{aligned}$$

due to \({\mathcal {I}}_1\) being the weighted \(L^2\)-projection. Note that in the case of constant c, the last term would also vanish due to \({{\,\mathrm{{div}}\,}}\varvec{v}|_K\in {\mathcal {Q}}_k(K)\).

Lemma 4

It holds for \(\sigma > k+1\)

$$\begin{aligned} \Vert {\varvec{\eta }}\Vert _{L^2(\varOmega )}\lesssim \varepsilon ^{1/2}(h+N^{-1}\max |\psi '|)^{k+1}. \end{aligned}$$

In the case of \({\mathcal {D}}(K)=RT_k(K)\) and \(\sigma \ge k+3/2\) we obtain

$$\begin{aligned} \Vert {{{\,\mathrm{{div}}\,}}\varvec{\eta }}\Vert _{L^2(\varOmega )}\lesssim \varepsilon ^{-1/2}(h+N^{-1}\max |\psi '|)^{k+1}, \end{aligned}$$

while for \({\mathcal {D}}(K)=BDM_k\) and \(\sigma \ge k+1/2\) we have

$$\begin{aligned} \Vert {{{\,\mathrm{{div}}\,}}\varvec{\eta }}\Vert _{L^2(\varOmega )}\lesssim \varepsilon ^{-1/2}(h+N^{-1}\max |\psi '|)^{k}. \end{aligned}$$

Proof

Using the solution decomposition (12) and the anisotropic interpolation error estimate (10) we obtain

$$\begin{aligned} \Vert {\varvec{{\mathcal {I}}}_2\varvec{s}-\varvec{s}}\Vert _{L^2(\varOmega )} = \Vert {\varvec{{\mathcal {J}}}\varvec{s}-\varvec{s}}\Vert _{L^2(\varOmega )} \lesssim (h+N^{-1})^{k+1}\Vert {\varvec{s}}\Vert _{H^{k+1}(\varOmega )} \lesssim \varepsilon (h+N^{-1})^{k+1}. \end{aligned}$$

For the boundary layer terms we use the special structure of \(\varvec{{\mathcal {I}}}_2\) and estimate differently on the subdomains of \(\varOmega \). We show the procedure for \(\varvec{w}^1\), the estimates of the other terms follow similarly. In \(\varOmega {\setminus }\varOmega _1\) the interpolant is zero and we get

$$\begin{aligned} \Vert {\varvec{{\mathcal {I}}}_2\varvec{w}^1-\varvec{w}^1}\Vert _{L^2(\varOmega {\setminus }\varOmega _1)} = \Vert {\varvec{w}^1}\Vert _{L^2(\varOmega {\setminus }\varOmega _1)} \lesssim \varepsilon ^{1/2} N^{-\sigma }. \end{aligned}$$

In \(\varOmega _1\) we have

$$\begin{aligned} \Vert {\varvec{{\mathcal {I}}}_2\varvec{w}^1-\varvec{w}^1}\Vert _{L^2(\varOmega _1)} \le \Vert {\varvec{{\mathcal {J}}}\varvec{w}^1-\varvec{w}^1}\Vert _{L^2(\varOmega _1)}+\Vert {\varvec{{\mathcal {P}}}^1\varvec{w}^1}\Vert _{L^2(\varOmega _1{\setminus }\varOmega _1^*)}, \end{aligned}$$

where \(\varvec{{\mathcal {P}}}^1:={\hat{\varvec{{\mathcal {J}}}}}^1-\varvec{{\mathcal {J}}}\). For the first term we obtain using (10) and (4)

$$\begin{aligned} \Vert {\varvec{{\mathcal {J}}}\varvec{w}^1-\varvec{w}^1}\Vert _{L^2(\varOmega _1)}^2&{\lesssim }\sum _{K\subset \varOmega _1}\sum _{\ell =0}^{k+1}\varepsilon ^{2(k+1-\ell )}(N^{-1}\max |\psi '|)^{2(k+1-\ell )}N^{-2\ell }\times \\&\Vert {\mathrm {e}^{\frac{(k+1-\ell )x}{\sigma \varepsilon }}\varepsilon ^{-(k+1-\ell )}\mathrm {e}^{-\frac{x}{\varepsilon }}}\Vert _{L^2(K)}^2\\&\lesssim (N^{-1}\max |\psi '|)^{2(k+1)}\Vert {\mathrm {e}^{\frac{(k+1-\sigma )x}{\sigma \varepsilon }}}\Vert _{L^2(\varOmega _1)}^2\\&\lesssim (N^{-1}\max |\psi '|)^{2(k+1)}\varepsilon , \end{aligned}$$

due to \(\sigma >k+1\). In the remaining ply of elements the operator \(\varvec{{\mathcal {P}}}^1\) in the Raviart-Thomas case is given by

$$\begin{aligned} \int _{F}\varvec{{\mathcal {P}}}^1 \varvec{w}^1\cdot \varvec{n}\cdot q&=0\text { for all faces }F\subset \partial K{\setminus }\varGamma _1,\,\forall q\in {\mathcal {P}}_k(F),\\ \int _{F}\varvec{{\mathcal {P}}}^1\varvec{w}^1\cdot \varvec{n}\cdot q&=\int _{F}\varvec{w}^1\cdot \varvec{n}\cdot q\text { for all faces }F\subset \partial K\cap \varGamma _1,\,\forall q\in {\mathcal {P}}_k(F),\\ \int _K(\varvec{{\mathcal {P}}}^1\varvec{w}^1)\cdot \varvec{q}&=0,\,\forall \varvec{q}\in {\mathcal {Q}}_{k-1,k}(K)\times {\mathcal {Q}}_{k,k-1}(K), \end{aligned}$$

and similarly for the other finite elements. Thus \(\varvec{{\mathcal {P}}}^1\varvec{w}^1\) depends only on \(\varvec{w}^1\cdot \varvec{n}|_{\varGamma _1}\). With \(\varvec{{\mathcal {P}}}^1\) on \(\varGamma _1\) being defined by weighted integrals, we have

$$\begin{aligned} \Vert {\varvec{{\mathcal {P}}}^1\varvec{w}^1}\Vert _{L^2(\varOmega _1{\setminus }\varOmega _1^*)} \lesssim {{\,\mathrm{meas}\,}}(\varOmega _1{\setminus }\varOmega _1^*)^{1/2}\Vert {\varvec{w}^1\cdot \varvec{n}}\Vert _{L^\infty (\varGamma _1)} \lesssim h_{N/4}^{1/2}N^{-\sigma } \lesssim \varepsilon ^{1/2}N^{-(k+1)}.\nonumber \\ \end{aligned}$$
(15)

Applying the same techniques to the other boundary and corner layer terms, and collecting the result finishes the first part of the proof.

For the divergence we can apply the same techniques with the difference of applying (11) instead of (10). We obtain for \(\sigma > k+1\) and \({\mathcal {D}}(K)=RT_k(K)\)

$$\begin{aligned} \Vert {{{\,\mathrm{{div}}\,}}(\varvec{{\mathcal {I}}}_2\varvec{s}-\varvec{s})}\Vert _{L^2(\varOmega )}&\lesssim (h+N^{-1})^{k+1}\Vert {{{\,\mathrm{{div}}\,}}\varvec{s}}\Vert _{H^{k+1}(\varOmega )} \lesssim \varepsilon (h+N^{-1})^{k+1},\\ \Vert {{{\,\mathrm{{div}}\,}}(\varvec{{\mathcal {I}}}_2\varvec{w}^1-\varvec{w}^1)}\Vert _{L^2(\varOmega {\setminus }\varOmega _1)}&= \Vert {{{\,\mathrm{{div}}\,}}\varvec{w}^1}\Vert _{L^2(\varOmega {\setminus }\varOmega _1)} \lesssim \varepsilon ^{-1/2} N^{-\sigma },\\ \Vert {{{\,\mathrm{{div}}\,}}(\varvec{{\mathcal {J}}}\varvec{w}^1-\varvec{w}^1)}\Vert _{L^2(\varOmega _1)}&\lesssim \varepsilon ^{-1}(N^{-1}\max |\psi '|)^{k+1}\Vert {\mathrm {e}^{\frac{(k+1-\sigma )x}{\sigma \varepsilon }}}\Vert _{L^2(\varOmega _1)}\\&\lesssim \varepsilon ^{-1/2}(N^{-1}\max |\psi '|)^{k+1}. \end{aligned}$$

The last term to estimate is the error on the ply of elements in \(\varOmega _1{\setminus }\varOmega _1^*\). A closer inspection of \(\varvec{{\mathcal {P}}}^1\varvec{w}^1\) reveals \((\varvec{{\mathcal {P}}}^1\varvec{w}^1)_2=0\). Thus, an inverse inequality followed by (15) yields

$$\begin{aligned} \Vert {{{\,\mathrm{{div}}\,}}(\varvec{{\mathcal {P}}}^1\varvec{w}^1)}\Vert _{L^2(\varOmega _1{\setminus }\varOmega _1^*)}&\lesssim h_{N/4}^{-1} \Vert {(\varvec{{\mathcal {P}}}^1\varvec{w}^1)_1}\Vert _{L^2(\varOmega _1{\setminus }\varOmega _1^*)} \lesssim h_{N/4}^{-1/2} \Vert {(\varvec{w}^1)_1}\Vert _{L^\infty (\varGamma _1)}\\&\lesssim \varepsilon ^{-1/2}N^{1/2}N^{-\sigma }, \end{aligned}$$

where \(h_{N/4}\ge h_{min}\ge \varepsilon N^{-1}\) holds due to the assumptions on \(\phi \). The analysis for the other terms of the decomposition follows the same lines.

For \({\mathcal {D}}(K)=BDM_k(K)\) the same analysis can be done, only replacing the convergence orders by k for \(\sigma \ge k+1/2\). \(\square \)

Lemma 5

Assuming \(h\varepsilon \lesssim N^{-2}\) and \(\sigma \ge k+1\), it holds for any \(V=(v,\varvec{v})\in {\mathcal {U}}_N\)

$$\begin{aligned} |\left\langle \eta ,{{\,\mathrm{{div}}\,}}\varvec{v} \right\rangle |\lesssim \varepsilon ^{-1/2}(h+N^{-1}\max |\psi '|)^{k+1}(\ln N)^{1/2}\Vert {\varvec{v}}\Vert _{L^2(\varOmega )}. \end{aligned}$$

Proof

Let \(c_K:=\frac{1}{{{\,\mathrm{meas}\,}}(K)}\int _K c\ge c_0\) be a piecewise constant approximation of c. Now

$$\begin{aligned} \left\langle \eta ,{{\,\mathrm{{div}}\,}}\varvec{v} \right\rangle&= \sum _{K\in T_N}\left\langle \eta ,{{\,\mathrm{{div}}\,}}\varvec{v} \right\rangle _K = \sum _{K\in T_N}\frac{1}{c_K}\left\langle (c_K-c)\eta ,{{\,\mathrm{{div}}\,}}\varvec{v} \right\rangle _K, \end{aligned}$$

due to the weighted \(L^2\)-projection and \({{\,\mathrm{{div}}\,}}\varvec{v}|_K\in {\mathcal {Q}}_k(K)\). It holds

$$\begin{aligned} \Vert {c_K-c}\Vert _{L^\infty (K)}\lesssim (h_x+h_y)\Vert {c}\Vert _{W^{1,\infty }(K)}. \end{aligned}$$

Thus we obtain, using \(h_x\) and \(h_y\) as abbreviations for the dimensions of K, an inverse inequality and (7) for any \(v\in H^{k+1}(K)\) and \(\varvec{v}\in {\mathcal {D}}_k(K)\)

$$\begin{aligned}&|\left\langle (c_K-c)(v-{\mathcal {I}}_1 v),\partial _x \varvec{v}_1 \right\rangle _K|\nonumber \\&\quad \lesssim \left( 1+\frac{h_y}{h_x}\right) \Vert {v-{\mathcal {I}}_1 v}\Vert _{L^2(K)}\Vert {\varvec{v}}\Vert _{L^2(K)}\nonumber \\&\quad \lesssim \left( (h_x+h_y)\Vert {h_x^{k}\partial _x^{k+1}v}\Vert _{L^2(K)}+\left( 1+\frac{h_y}{h_x}\right) \Vert {h_y^{k+1}\partial _y^{k+1}v}\Vert _{L^2(K)}\right) \Vert {\varvec{v}}\Vert _{L^2(K)} \end{aligned}$$
(16)

and similarly for the y-derivative of the second component.

Let us start with the smooth part s of the solution decomposition and denote the coarse part of \(\varOmega \) by \(\varOmega _c:=\varOmega {\setminus }\bigcup _{i=1}^4\varOmega _i\) and the union of corners by \(\varOmega _{cor}:= \varOmega _{12}\cup \varOmega _{23}\cup \varOmega _{34}\cup \varOmega _{41}\). Then we obtain by using (16)

$$\begin{aligned} |\left\langle (c_K-c)(s-{\mathcal {I}}_1 s),\partial _x \varvec{v}_1 \right\rangle |&\lesssim \Big ((h+N^{-1})^{k+1}\Vert {\partial _x^{k+1}s}\Vert _{L^2(\varOmega )} +N^{-(k+1)}\Vert {\partial _y^{k+1}s}\Vert _{L^2(\varOmega _c)}\\&\quad +\varepsilon ^{-1}N^{-(k+1)}\Vert {\partial _y^{k+1}s}\Vert _{L^2((\varOmega _1\cup \varOmega _3){\setminus }\varOmega _{cor})}\\&\quad +hNh^{k+1}\Vert {\partial _y^{k+1}s}\Vert _{L^2((\varOmega _2\cup \varOmega _4){\setminus }\varOmega _{cor})}\\&\quad +{h\varepsilon ^{-1}} Nh^{k+1}\Vert {\partial _y^{k+1}s}\Vert _{L^2(\varOmega _{cor})} \Big )\Vert {\varvec{v}}\Vert _{L^2(\varOmega )}\\&\lesssim \varepsilon ^{-1/2}(h+N^{-1})^{k+1}{\left( (\ln N)^{1/2}+h\varepsilon ^{1/2} N\ln N\right) }\Vert {\varvec{v}}\Vert _{L^2(\varOmega )}\\&\lesssim \varepsilon ^{-1/2}(h+N^{-1})^{k+1}(\ln N)^{1/2}\Vert {\varvec{v}}\Vert _{L^2(\varOmega )} \end{aligned}$$

due to \(h_{min}\ge {\varepsilon N^{-1}}\) and the condition on \(h\varepsilon \) implying

$$\begin{aligned} h\varepsilon ^{1/2}\lesssim (h\varepsilon )^{3/4}\lesssim N^{-3/2}. \end{aligned}$$

For \(\partial _y \varvec{v}_2\) holds a similar estimate due to symmetry. Next we look at the boundary layer term \(w_1\). We obtain in \(\varOmega _1\) again by using (16)

$$\begin{aligned}&|\left\langle (c_K-c)(w_1-{\mathcal {I}}_1 w_1),\partial _x \varvec{v}_1 \right\rangle _{\varOmega _1}|\\&\qquad {\lesssim \sum _{K\subset \varOmega _1}\big ((h+N^{-1})\Vert {h_x^{k}\partial _x^{k+1}w_1}\Vert _{L^2(K)}}\\&\qquad + {\varepsilon ^{-1}N^{-(k+1)}\Vert {\partial _y^{k+1}w_1}\Vert _{L^2(K{\setminus }\varOmega _{cor})}+ Nh^{k+1} \Vert {\partial _y^{k+1}w_1}\Vert _{L^2(K\cap \varOmega _{cor})}\big )\Vert {\varvec{v}}\Vert _{L^2(K)}}\\&\quad \lesssim \bigg ((h+N^{-1}\max |\psi '|)^{k+1}\varepsilon ^{-1}\Vert {\mathrm {e}^{\frac{kx}{\sigma \varepsilon }}\mathrm {e}^{-\frac{x}{\varepsilon }}}\Vert _{L^2(\varOmega _1)}\\&\qquad + \varepsilon ^{-1}N^{-(k+1)}\Vert {\mathrm {e}^{-\frac{x}{\varepsilon }}}\Vert _{L^2(\varOmega _1{\setminus }\varOmega _{cor})}+ Nh^{k+1}\Vert {\mathrm {e}^{-\frac{x}{\varepsilon }}}\Vert _{L^2(\varOmega _1\cap \varOmega _{cor})}\bigg )\Vert {\varvec{v}}\Vert _{L^2(\varOmega _1)}\\&\quad \lesssim \varepsilon ^{-1/2}(h+N^{-1}\max |\psi '|)^{k+1}\left( 1+h\varepsilon N^2\right) \Vert {\varvec{v}}\Vert _{L^2(\varOmega _1)}\\&\quad \lesssim \varepsilon ^{-1/2}(h+N^{-1}\max |\psi '|)^{k+1}\Vert {\varvec{v}}\Vert _{L^2(\varOmega _1)}, \end{aligned}$$

using again the condition on \(\varepsilon h\). For the y-derivative it holds similarly

$$\begin{aligned}&|\left\langle (c_K-c)(w_1-{\mathcal {I}}_1 w_1),\partial _y \varvec{v}_2 \right\rangle _{\varOmega _1}|\\&\qquad {\lesssim \sum _{K\subset \varOmega _1}\big ((h+N^{-1})\Vert {h_y^{k}\partial _y^{k+1}w_1}\Vert _{L^2(K)}+} {(1+hN)\Vert {h_x^{k+1}\partial _x^{k+1}w_1}\Vert _{L^2(K{\setminus }\varOmega _{cor})}}\\&\qquad +{(1+h\varepsilon ^{-1}N)\Vert {h_x^{k+1}\partial _x^{k+1}w_1}\Vert _{L^2(K\cap \varOmega _{cor})}\big )\Vert {\varvec{v}}\Vert _{L^2(K)}}\\&\quad \lesssim \bigg ((h+N^{-1})^{k+1}\Vert {\mathrm {e}^{-\frac{x}{\varepsilon }}}\Vert _{L^2(\varOmega _1)}\\&\qquad + (1+hN)(N^{-1}\max |\psi '|)^{k+1}\Vert {\mathrm {e}^{\frac{(k+1)x}{\varepsilon }}\mathrm {e}^{-\frac{x}{\varepsilon }}}\Vert _{L^2(\varOmega _1{\setminus }\varOmega _{cor})}\\&\qquad + (1+h\varepsilon ^{-1}N)(N^{-1}\max |\psi '|)^{k+1}\Vert {\mathrm {e}^{\frac{(k+1)x}{\varepsilon }}\mathrm {e}^{-\frac{x}{\varepsilon }}}\Vert _{L^2(\varOmega _1\cap \varOmega _{cor})}\bigg )\Vert {\varvec{v}}\Vert _{L^2(\varOmega _1)}\\&\quad \lesssim \varepsilon ^{-1/2}(h+N^{-1}\max |\psi '|)^{k+1}(\varepsilon +h\varepsilon N(\ln N)^{1/2}+h\varepsilon ^{3/2}N\ln N)\Vert {\varvec{v}}\Vert _{L^2(\varOmega _1)}\\&\quad \lesssim \varepsilon ^{-1/2}(h+N^{-1}\max |\psi '|)^{k+1}\Vert {\varvec{v}}\Vert _{L^2(\varOmega _1)}, \end{aligned}$$

where the condition on \(\varepsilon h\) was used in the last step.

In the remainder of the domain we apply the \(L^2\)-stability and get by considering the different cases of \(\frac{h_y}{h_x}\)

$$\begin{aligned}&|\,\left\langle (c_K-c)(w_1-{\mathcal {I}}_1 w_1),\partial _x \varvec{v}_1 \right\rangle _{\varOmega {\setminus }\varOmega _1}|\\&\quad {\lesssim \sum _{K\subset \varOmega {\setminus }\varOmega _1} \Vert {c_K-c}\Vert _{L^\infty (K)}\Vert {w_1-{\mathcal {I}}_1w_1}\Vert _{L^2(K)}\Vert {h_x^{-1}\varvec{v}_1}\Vert _{L^2(K)}}\\&\quad {\lesssim \sum _{K\subset \varOmega {\setminus }\varOmega _1} \left( 1+\frac{h_y}{h_x}\right) \Vert {w_1}\Vert _{L^2(K)}\Vert {\varvec{v}}\Vert _{L^2(K)}}\\&\quad {\lesssim \left( 1+hN+\varepsilon ^{-1}+h\varepsilon ^{-1}N\right) \Vert {w_1}\Vert _{L^2(\varOmega {\setminus }\varOmega _1)}\Vert {\varvec{v}}\Vert _{L^2(\varOmega {\setminus }\varOmega _1)}}\\&\quad {\lesssim \varepsilon ^{-1/2}N^{-(k+1)}\Vert {\varvec{v}}\Vert _{L^2(\varOmega {\setminus }\varOmega _1)}}\\&|\,\left\langle (c_K-c)(w_1-{\mathcal {I}}_1 w_1),\partial _y \varvec{v}_2 \right\rangle _{\varOmega {\setminus }\varOmega _1}|\\&\quad {\lesssim \sum _{K\subset \varOmega {\setminus }\varOmega _1} \left( 1+\frac{h_x}{h_y}\right) \Vert {w_1}\Vert _{L^2(K)}\Vert {\varvec{v}}\Vert _{L^2(K)}}\\&\quad {\lesssim \varepsilon ^{-1/2}N^{-(k+1)}\Vert {\varvec{v}}\Vert _{L^2(\varOmega {\setminus }\varOmega _1)}.} \end{aligned}$$

The estimation of the other boundary layer terms and of the corner layer terms is similar. Combining all the individual results proves the assertion. \(\square \)

Lemma 6

For \(h\varepsilon \lesssim N^{-2}\) it holds for \({\mathcal {D}}(K)=RT_k(K)\) with \(\sigma \ge k+3/2\)

$$\begin{aligned} \left| \!\!\;\left| \!\!\;\left| {(\xi ,\varvec{\xi })} \right| \!\!\;\right| \!\!\;\right| _{bal}\lesssim (h+N^{-1}\max |\psi '|)^{k+1}(\ln N)^{1/2} \end{aligned}$$

and for \({\mathcal {D}}(K)=BDM_k(K)\) with \(\sigma \ge k+1\)

$$\begin{aligned} \left| \!\!\;\left| \!\!\;\left| {(\xi ,\varvec{\xi })} \right| \!\!\;\right| \!\!\;\right| _{bal}\lesssim (h+N^{-1}\max |\psi '|)^{k}. \end{aligned}$$

Proof

Using the inf-sup estimate (14) and the previous lemmas we obtain for \({\mathcal {D}}(K)=RT_k(K)\)

$$\begin{aligned} \beta \left| \!\!\;\left| \!\!\;\left| {(\xi ,\varvec{\xi })} \right| \!\!\;\right| \!\!\;\right| \le \sup _{V\in {\mathcal {U}}_N}\frac{B((\eta ,\varvec{\eta }),V)}{\Vert {V}\Vert _{L^2(\varOmega )}} \lesssim \varepsilon ^{1/2}(h+N^{-1}\max |\psi '|)^{k+1}(\ln N)^{1/2}, \end{aligned}$$

particularly

$$\begin{aligned}&\Vert {\xi }\Vert _{L^2(\varOmega )}\lesssim \varepsilon ^{1/2}(h+N^{-1}\max |\psi '|)^{k+1}(\ln N)^{1/2},\\&\varepsilon ^{-1/2}\Vert {\varvec{\xi }}\Vert _{L^2(\varOmega )}\lesssim (h+N^{-1}\max |\psi '|)^{k+1}(\ln N)^{1/2},\\&\delta ^{1/2}\varepsilon ^{-1/2}\Vert {\varepsilon {{\,\mathrm{{div}}\,}}\varvec{\xi }}\Vert _{L^2(\varOmega )} \lesssim (h+N^{-1}\max |\psi '|)^{k+1}(\ln N)^{1/2}, \end{aligned}$$

which are the three components of \(\left| \!\!\;\left| \!\!\;\left| {\xi } \right| \!\!\;\right| \!\!\;\right| _{bal}\). Similar results follow for \({\mathcal {D}}(K)=BDM_k(K)\) with the additional

$$\begin{aligned} (h+N^{-1}\max |\psi '|)(\ln N)^{1/2}\lesssim 1. \end{aligned}$$

\(\square \)

Theorem 7

For \(h\varepsilon \lesssim N^{-2}\) it holds for \({\mathcal {D}}(K)=RT_k(K)\) with \(\sigma \ge k+3/2\)

$$\begin{aligned} \left| \!\!\;\left| \!\!\;\left| {U-U_N} \right| \!\!\;\right| \!\!\;\right| _{bal}\lesssim (h+N^{-1}\max |\psi '|)^{k+1}(\ln N)^{1/2}. \end{aligned}$$

and for \({\mathcal {D}}(K)=BDM_k(K)\) with \(\sigma \ge k+1\)

$$\begin{aligned} \left| \!\!\;\left| \!\!\;\left| {U-U_N} \right| \!\!\;\right| \!\!\;\right| _{bal}\lesssim (h+N^{-1}\max |\psi '|)^{k}. \end{aligned}$$

Proof

With the triangle inequality

$$\begin{aligned} \left| \!\!\;\left| \!\!\;\left| {U-U_N} \right| \!\!\;\right| \!\!\;\right| _{bal}\le \left| \!\!\;\left| \!\!\;\left| {(\xi ,\varvec{\xi })} \right| \!\!\;\right| \!\!\;\right| _{bal}+\left| \!\!\;\left| \!\!\;\left| {(\eta ,\varvec{\eta })} \right| \!\!\;\right| \!\!\;\right| _{bal} \end{aligned}$$

and the previous lemmas it only remains to estimate \(\Vert {\eta }\Vert _{L^2(\varOmega )}\), which can be done using the local anisotropic interpolation error estimates (7) by standard techniques

$$\begin{aligned} \Vert {\eta }\Vert _{L^2(\varOmega )} \lesssim (h+N^{-1}\max |\psi '|)^{k+1}. \end{aligned}$$

\(\square \)

Corollary 8

Under the same conditions as the previous theorem we also have for \({\mathcal {D}}(K)=RT_k(K)\) in the unbalanced norm

$$\begin{aligned} \left| \!\!\;\left| \!\!\;\left| {U-U_N} \right| \!\!\;\right| \!\!\;\right| \lesssim (h+N^{-1}\max |\psi '|)^{k+1}. \end{aligned}$$

Remark 9

On a Shishkin mesh we have \({h=h_i\lesssim }\varepsilon N^{-1}\ln N\) inside the boundary domain. Thus the condition on \(\varepsilon h\) becomes

$$\begin{aligned} \varepsilon ^2\lesssim \frac{N^{-1}}{\ln N} \quad \text{ and }\quad h+N^{-1}\max |\psi '| \lesssim N^{-1}\ln N. \end{aligned}$$

On a Bakhvalov S-mesh we have \(h\sim \varepsilon \) and the condition becomes

$$\begin{aligned} \varepsilon \lesssim N^{-1} \quad \text{ and } \text{ therefore }\quad h+N^{-1}\max |\psi '| \lesssim N^{-1}. \end{aligned}$$

Note also, that \(\varepsilon h\lesssim N^{-2}\) and \(h\lesssim \varepsilon \) always imply \(h\lesssim N^{-1}\).

Remark 10

The same analysis can also be conducted for the Arnold-Boffi-Falk element

$$\begin{aligned} {\mathcal {D}}(K)=ABF_k(K):={\mathcal {Q}}_{k+2,k}(K)\times {\mathcal {Q}}_{k,k+2}(K), \end{aligned}$$

see [3], using \({{\,\mathrm{{div}}\,}}{\mathcal {D}}(K)={\mathcal {Q}}_{k+1}(K){\setminus }\text {span}\{x^{k+1}y^{k+1}\}\) as discrete space for the first component. Anisotropic interpolation error estimates are given in [7]. Although \(\Vert {\eta }\Vert _{L^2(\varOmega )}\) and \(\Vert {{{\,\mathrm{{div}}\,}}\varvec{\eta }}\Vert _{L^2(\varOmega )}\) can be estimated with order \(k+2\), we obtain only convergence rates of order \(k+1\) due to \(\Vert {\varvec{\eta }}\Vert _{L^2(\varOmega )}\lesssim \varepsilon ^{1/2}(h+N^{-1}\max |\psi '|)^{k+1}\).

4 Numerical Experiments

Let us consider on \(\varOmega =(0,1)^2\)

$$\begin{aligned} -\varepsilon ^2\varDelta u+cu=f, \end{aligned}$$

where

$$\begin{aligned} c=1+x^2y^2\mathrm {e}^{xy/2} \quad \Rightarrow \quad c_0=1,\,c_\infty =1+\mathrm {e}^{1/2},\,\delta =\frac{1}{(1+\mathrm {e}^{1/2})^2} \end{aligned}$$

and an exact solution

$$\begin{aligned} u=\left( \cos \left( \frac{\pi x}{2}\right) -\frac{\mathrm {e}^{-x/\varepsilon }-\mathrm {e}^{-1/\varepsilon }}{1-\mathrm {e}^{-1/\varepsilon }}\right) \cdot \left( 1-y- \frac{\mathrm {e}^{-y/\varepsilon }-\mathrm {e}^{-1/\varepsilon }}{1-\mathrm {e}^{-1/\varepsilon }}\right) \end{aligned}$$

is prescribed, see [1, 17] for \(c=1\). The solution has only boundary layers at \(x=0\) and \(y=0\), and a corner layer at (0, 0). Therefore, we modify our mesh accordingly. For our experiments we will always use Bakhvalov-S-meshes.

All computations were done in \(\mathbb {SOFE}\), a finite-element framework in Matlab and Octave, see github.com/SOFE-Developers/SOFE.

Let us start the numerical investigation by looking at the dependence on \(\varepsilon \). For that we fix \(N=16\) and use \(RT_1\)-elements, and vary \(\varepsilon \in \{10^{-3},\,10^{-4},\,10^{-5},\,10^{-6}\}\). We obtain the numbers in Table 1.

Table 1 Errors in various norms for varying values of \(\varepsilon \) and fixed N, \(RT_1\)

As expected from Theorem 7, we observe independence of \(\varepsilon \) in \(\Vert {u-u_h}\Vert _{L^2(\varOmega )}\), and a dependence on \(\varepsilon \) in \(\Vert {\varvec{u}-\varvec{u}_h}\Vert _{L^2(\varOmega )}\lesssim \varepsilon ^{1/2}\) and \(\Vert {{{\,\mathrm{{div}}\,}}(\varvec{u}-\varvec{u}_h)}\Vert _{L^2(\varOmega )}\lesssim \varepsilon ^{-1/2}\). Consequently, \(\left| \!\!\;\left| \!\!\;\left| {U-U_h} \right| \!\!\;\right| \!\!\;\right| \) stays independent of \(\varepsilon \), due to the dominating effect of \(\Vert {u-u_h}\Vert _{L^2(\varOmega )}\), and the larger balanced norm \(\left| \!\!\;\left| \!\!\;\left| {U-U_h} \right| \!\!\;\right| \!\!\;\right| _{bal}\) is independent too due to the correct weighting of the other two norms.

Now let us come to the convergence orders. For that purpose we fix \(\varepsilon =10^{-4}\) and vary for different values of k the number N of cells per dimension. We start with Raviart-Thomas elements and obtain the results of Table 2.

Table 2 Errors \(\left| \!\!\;\left| \!\!\;\left| {U-U_h} \right| \!\!\;\right| \!\!\;\right| _{bal}\) for fixed \(\varepsilon =10^{-4}\) in the Raviart–Thomas case

Here along with the computed errors also the estimated rates of convergence are given and they are close to the expected rates of \(k+1\) for the balanced norm.

In the case of Brezzi-Douglas-Marini elements we get Table 3.

Table 3 Errors \(\left| \!\!\;\left| \!\!\;\left| {U-U_h} \right| \!\!\;\right| \!\!\;\right| _{bal}\) for fixed \(\varepsilon =10^{-4}\) in the Brezzi–Douglas–Marini case

As expected we only see rates of k in the balanced version with slightly better results for the lowest order case. The reason for this behaviour lies in the components of the balanced norms, where the faster converging ones dominate for smaller values of N the balanced norm. A closer look reveals \(\Vert {u-u_h}\Vert _{L^2(\varOmega )}\) and \(\Vert {{{\,\mathrm{{div}}\,}}(\varvec{u}-\varvec{u}_h)}\Vert _{L^2(\varOmega )}\) only to be convergent with order 1, see Table 4.

Table 4 Errors is various norms for fixed \(\varepsilon =10^{-4}\) in the \(BDM_1\)-case