1 Introduction

Let \(\Omega \subset \mathbb {R}^d\) be a polygonal domain with boundary \(\partial \Omega \), where \(\partial \Omega =\partial \Omega _D\cup \partial \Omega _N\), \(\partial \Omega _D\cap \partial \Omega _N= \emptyset \) and \(\partial \Omega _D\ne \emptyset \). We consider the following stress-velocity-pressure formulation of the Stokes equations

$$\begin{aligned} \underline{\sigma }&=2\mu \varepsilon (\varvec{u})+p\underline{I}{} & {} \quad \text{ in }\;\Omega \end{aligned}$$
(1.1)
$$\begin{aligned} -\textrm{div}\underline{\sigma }&=\varvec{f}_S{} & {} \quad \text{ in }\;\Omega ,\end{aligned}$$
(1.2)
$$\begin{aligned} \nabla \cdot \varvec{u}&=0{} & {} \quad \text{ in }\;\Omega \end{aligned}$$
(1.3)

and the Navier–Stokes equations

$$\begin{aligned} \underline{\sigma }&=2\mu \varepsilon (\varvec{u})+p\underline{I}{} & {} \quad \text {in }\Omega , \end{aligned}$$
(1.4)
$$\begin{aligned} -\textrm{div}\underline{\sigma }+\textrm{div}(\varvec{u}\otimes \varvec{u})&=\varvec{f}_N{} & {} \quad \text {in }\Omega , \end{aligned}$$
(1.5)
$$\begin{aligned} \nabla \cdot \varvec{u}&=0{} & {} \quad \text{ in }\;\Omega , \end{aligned}$$
(1.6)

which are supplemented with the boundary condition \( \varvec{u}=\varvec{0}\) on \(\partial \Omega _D\) and \(\underline{\sigma }\varvec{n}=\varvec{0}\) on \(\partial \Omega _N\). Here \(\varepsilon (\varvec{u})=\frac{\nabla \varvec{u}+\nabla \varvec{u}^T}{2}\) and \(\varvec{n}\) is the unit outward normal vector to \(\partial \Omega \). Moreover, \(\varvec{f}_S, \varvec{f}_N\in \varvec{L}^2(\Omega )\) are the external body forces, \(\mu \) is the effective viscosity constant, and \(\underline{I}\) is the identity matrix. If \(\partial \Omega _N\ne \emptyset \), there exists a unique solution \((\varvec{u},p)\) to the above problem. If \(\partial \Omega _N=\emptyset \), then we need to enforce the condition \(\int _{\Omega }p\;dx=0\) to guarantee the uniqueness of the solution (cf. [17]). For the ease of presentation, we assume that \(\partial \Omega _N\ne \emptyset \). The analysis presented below can also be adapted to the case \(\partial \Omega _N=\emptyset \) without much difficulty.

The stress(pseudostress)-velocity/stress-velocity-pressure formulation for incompressible flows has drawn great attention over the past decades [3, 4, 7,8,9, 13, 18, 20, 25, 36]. These formulations enjoy the salient features, which can be summarized as follows: they come from the original physical laws and give a direct description of the stresses, which in some applications is the most interesting variable; formally they resemble the stress-displacement formulation of elasticity equations, which hopefully will give a better understanding of the coupled solid-fluid problem; they can give a unified formulation for Newtonian and non-Newtonian flows. The development and analysis for incompressible flows based on the stress-velocity formulation is important and interesting, however, the design of symmetric stress in the numerical approximations faces big challenges. As it is well-known, the construction of symmetric stress space is tricky and generally involves sophisticated procedures. In this paper, we aim to introduce a stress-velocity formulation for the Stokes equations and the Navier–Stokes equations. Specifically, the stress space is approximated using piecewise polynomials of order k with strong symmetry, and the velocity space is approximated using \(H(\textrm{div};\Omega )\)-conforming space of order \(k+1\). In contrast, the tangential trace is approximated using piecewise polynomials of order k only. We remark that the formulation hinges on a carefully designed numerical flux, which guarantees the optimal convergence error estimates of the scheme regardless of the fact that the polynomial degree of the tangential trace being one order lower than that of the velocity.

The discrete formulations for the Stokes equations and the Navier–Stokes equations are designed in a similar manner, except that the non-negative convective term is incorporated in the Navier–Stokes equations. The advantages of the proposed formulations are multifold: it yields divergence-free velocity, which ensures the pressure-robustness; it is robust with respect to the values of viscosity; it is hybridizable and the size of the globally coupled system is greatly reduced, rendering the method computationally efficient. Note that the design of pressure-robust schemes was actively studied over the past few years, one can for instance refer to [14, 23, 26, 30] in the context of pressure-robustness. We also emphasize that the current approach can be straightforwardly applied to time-dependent incompressible flows, and at each time step, we need to solve a global system involving the normal trace and the tangential trace of velocity, and the piecewise constant approximation for the trace of the stress. These salient features make our method highly efficient for the simulations of large scale incompressible flows.

To illustrate the convergence of the proposed scheme with respect to smooth solutions, a rigorous convergence error analysis reflecting pressure-independence is developed for the Stokes equations. The discrete formulation only involves the stress, velocity and the numerical trace of velocity, which makes it less trivial to derive the pressure-independent error estimates. Unlike the velocity gradient-velocity-pressure formulation and the velocity-pressure formulation, the standard approach for the error analysis of the stress-velocity formulation will lead to the error estimates for velocity depending on the stress, which is pressure-dependent (cf. (1.1)). To overcome this issue, we need to refine the standard error analysis. In fact, our pressure-independent error estimates rely on two key observations: the \(L^2\)-projection operator to the stress space enjoys a nice commuting property, which enables us to decouple the pressure variable from the definition of the stress and only employs the deviatoric part of stress; the normal-tangential component of the stress and the deviatoric part of stress over the interface of the elements are equivalent, namely, \((\mathcal {A} \underline{\sigma }\varvec{n})^t_{|F}= (\underline{\sigma }\varvec{n})^t_{|F}\) for \(F\in \mathcal {F}_h\). The optimal convergence error estimates for all the variables measured in a suitable norm are obtained; indeed, the \(L^2\)-error for stress and velocity, and the energy error for velocity are explored. In particular, from the perspectives on the degrees of freedom of the globally coupled unknowns, superconvergence for velocity in the discrete \(H^1\)-norm is obtained. Moreover, we are able to show that the velocity error estimates only depend on the deviatoric part of the stress, which ensures the pressure-independence of the error estimates. To the best of our knowledge, the pressure-independent error estimates for the stress (pseudostress)-velocity formulation have not been discussed in the existing literature and the developed approaches will shed new insight onto other pressure-robust discretizations.

In many practical applications, the solutions under consideration are non-smooth. In this situation, it will be interesting to explore the convergence under minimal regularity assumptions. However, the existing works in the direction of HDG methods mainly focus on error estimates under strong regularity assumptions, which excludes the non-smooth solutions that are often present in realistic scenarios. To fill this gap, one of our main goal is to develop convergence to the weak solution under minimal regularity assumptions for the Navier–Stokes equations. To this end, we first show the discrete \(H^1\)-stability for the proposed formulation by using a local version of Korn inequality. Then the convergence to the weak solution can be established by using the boundedness of the discrete solution and the compactness argument. Moreover, the discrete gradient operator defined using the lifting operator also plays an important role for the convergence proof. Specifically, two discrete gradient operators are employed to favor the proof. We first construct a discrete gradient operator to link the discrete stress and velocity, and the convergence of the discrete gradient operator to \(\nabla \varvec{u}\) is proved via the discrete \(H^1\) stability of the velocity. Then we construct the second discrete gradient operator to facilitate the analysis for the convective trilinear term. The strong convergence of the discrete velocity and stress to the weak solution is analyzed. In particular, the strong convergence for the velocity in discrete \(H^1\)-norm is achieved, where a judicious choice of elliptic projection in conjunction with a local Korn inequality is employed. To the best of our knowledge, this is the first work on the proof of convergence to a weak solution under minimal regularity assumptions for a pressure-robust discretization. The methodologies used in this paper can also be extended to analyze other spatial discretizations.

The rest of the paper is organized as follows. In the next section, we introduce the pressure-robust discretizations for the Stokes equations and the Navier–Stokes equations. Moreover, the main results are presented. In Sect. 3 we present the characterization of the proposed HDG scheme. In Sect. 4, the discrete \(H^1\)-stability and the convergence error estimates for the Stokes equations are proved. Then the convergence to the weak solution for the Navier–Stokes equations under minimal regularity assumptions is rigorously analyzed in Sect. 5. Several numerical experiments are carried out in Sect. 6 to confirm the proposed theories and demonstrate the capabilities of the proposed scheme.

2 The pressure-robust discretization and main results

2.1 The pressure-robust discretization in stress-velocity formulation

In this subsection, we first describe the model problem, then the corresponding discrete formulation will be given. To begin, we introduce some notation that will be used throughout the paper. We will use the most common Sobolev spaces \(H^r(\mathcal {O})\) for non-negative integer r, where \(\mathcal {O}\subset \Omega ,\Omega \subset \mathbb {R}^d,d=2,3\). The spaces of vector- and matrix-valued functions with all the components in \(H^r(\mathcal {O})\) will be respectively denoted as \(\varvec{H}^r(\mathcal {O})\) and \(\underline{H}^r(\mathcal {O})\). Also, we use S to denote the set of symmetric \(d\times d\) matrices and define \(\underline{H}^r(S,\Omega ):=\{\underline{w}\in \underline{H}^r(\Omega ); \underline{w}=\underline{w}^T\}\). We use \((\cdot ,\cdot )_D\) to represent the standard \(L^2\)-inner product over \(D\subset \mathbb {R}^d\) and the corresponding norm is denoted as \(\Vert \cdot \Vert _{L^2(D)}\), and we use \(\langle \cdot ,\cdot \rangle _D\) to represent the \(L^2\)-inner product on \(D\subset \mathbb {R}^{d-1}\). In the sequel, C represents a generic positive constant independent of the meshsize and \(\mu \), which may have different values at different occurrences.

In our discretization given below, we aim to eliminate the pressure variable p. To this end, it holds in view of (1.3) that

$$\begin{aligned} \textrm{tr}(\underline{\sigma })= d p, \end{aligned}$$

thus \(p=\frac{\textrm{tr}(\underline{\sigma })}{d}\). For any tensor field \(\underline{H}\), we define \(\mathcal {A}\underline{H}{:=}\underline{H}- \frac{\textrm{tr}(\underline{H})}{d}{\underline{I}}\), where \(\mathcal {A}\underline{H}\) is a trace-free tensor and is called the deviatoric part. Then we can infer from (1.1) that \( \mathcal {A}\underline{\sigma } = 2\mu \varepsilon (\varvec{u})\). Thus, the model problem (1.1)–(1.3) can be recast into the following equivalent form:

$$\begin{aligned} (2\mu )^{-1}\mathcal {A}\underline{\sigma }&= \varepsilon (\varvec{u}){} & {} \quad \text{ in }\;\Omega , \end{aligned}$$
(2.1)
$$\begin{aligned} -\textrm{div}\underline{\sigma }&= \varvec{f}_S{} & {} \quad \text{ in }\;\Omega . \end{aligned}$$
(2.2)

We let \(\varvec{H}^1_0(\Gamma _D):=\{\varvec{v}\in \varvec{H}^1(\Omega );\varvec{v}=\varvec{0}\;\text{ on }\;\partial \Omega _D\}\) and \(X:=\underline{L}^2(S,\Omega )\times \varvec{H}_0^1(\Gamma _D)\). We define the norm \(\Vert (\underline{w},\varvec{v})\Vert _X^2{:=}\Vert \mu ^{-\frac{1}{2}}\underline{w}\Vert _{L^2(\Omega )}^2+\Vert \mu ^{\frac{1}{2}}\varvec{v}\Vert _{H^1(\Omega )}^2\) for any \( (\underline{w},\varvec{v})\in X\). Then, the weak formulation for (1.1)–(1.3) reads as follows: Find \((\underline{\sigma },\varvec{u})\in X\) such that

$$\begin{aligned} ((2\mu )^{-1}\mathcal {A}\underline{\sigma } ,\underline{w})&= ( \varepsilon (\varvec{u}) ,\underline{w}), \end{aligned}$$
(2.3)
$$\begin{aligned} ( \underline{\sigma } ,\varepsilon (\varvec{v}))&= (\varvec{f}_S,\varvec{v}) \end{aligned}$$
(2.4)

for all \((\underline{w},\varvec{v})\in X\).

The weak solution to (1.4)–(1.6) is defined by: Find \((\underline{\sigma },\varvec{u})\in \underline{L}^2(S,\Omega )\times \varvec{H}^1_0(\Gamma _D)\) such that

$$\begin{aligned} ((2\mu )^{-1}\mathcal {A}\underline{\sigma }-\varepsilon (\varvec{u}),\underline{H})&=0{} & {} \forall \underline{H}\in \underline{L}^2(S,\Omega ), \end{aligned}$$
(2.5)
$$\begin{aligned} (\underline{\sigma },\varepsilon (\varvec{v}))+(\varvec{u}\cdot \nabla \varvec{u},\varvec{v})&=(\varvec{f}_N,\varvec{v}){} & {} \quad \forall \varvec{v}\in \varvec{H}^1_0(\Gamma _D). \end{aligned}$$
(2.6)

To simplify the notation, we define \(A((\underline{\sigma },\varvec{u}),(\underline{w},\varvec{v}))=((2\mu )^{-1}\mathcal {A}\underline{\sigma },\underline{w})-( \varepsilon (\varvec{u}) ,\underline{w})+ ( \underline{\sigma } ,\varepsilon (\varvec{v}))\). For later use, we introduce the Helmholtz projection operator \(\mathbb {P}\). We let \(H(\textrm{div};\Omega ):=\{\varvec{v}\in \varvec{L}^2(\Omega ), \textrm{div}\varvec{v}\in L^2(\Omega )\}\) and \(H_0(\textrm{div};\Omega ):=\{\varvec{v}\in H(\textrm{div};\Omega );\varvec{v}\cdot \varvec{n}=0\;\text{ on }\;\partial \Omega \}\). For every vector field \(\varvec{f}\in \varvec{L}^2(\Omega )\), we have (cf. [33])

$$\begin{aligned} \varvec{f}=\nabla \alpha +\varvec{\beta }, \end{aligned}$$
(2.7)

where \(\alpha \in H^1(\Omega )\) and \(\varvec{\beta }\in H_0(\textrm{div};\Omega )\) is the divergence-free remainder that is called the Helmholtz projector, i.e., \(\mathbb {P}(\varvec{f}):=\varvec{\beta }\).

Theorem 2.1

There exists a unique solution to (2.3)–(2.4).

Proof

Let \((\underline{w},\varvec{v})\in X\) and set \(\mathbb {M}=\sup _{(\underline{H},\varvec{\theta })\in X\backslash \{0\}}\frac{A((\underline{w},\varvec{v}),(\underline{H},\varvec{\theta }))}{\Vert (\underline{H},\varvec{\theta })\Vert _X}\). We have

$$\begin{aligned} \left\| (2\mu )^{-\frac{1}{2}}\mathcal {A} \underline{w}\right\| _{L^2(\Omega )}^2=A((\underline{w},\varvec{v}),(\underline{w},\varvec{v}))\le \mathbb {M} \Vert (\underline{w},\varvec{v})\Vert _X. \end{aligned}$$

The Poincaré inequality and the Korn inequality respectively yield for \(\varvec{v}\in \varvec{H}^1_0(\Gamma _D)\)

$$\begin{aligned} \left\| \mu ^{\frac{1}{2}}\varvec{v}\right\| _{H^1(\Omega )}\le C_p \left\| \mu ^{\frac{1}{2}}\nabla \varvec{v}\right\| _{L^2(\Omega )} \end{aligned}$$

and

$$\begin{aligned} \left\| \mu ^{\frac{1}{2}}\nabla \varvec{v}\right\| _{L^2(\Omega )}\le C_{k} \left\| \mu ^{\frac{1}{2}}\varepsilon (\varvec{v})\right\| _{L^2(\Omega )}, \end{aligned}$$

where \(C_p\) and \(C_k\) are positive constants. Thus, we can infer that

$$\begin{aligned} \begin{aligned} {\frac{1}{C_pC_k}} \left\| \mu ^{\frac{1}{2}}\varvec{v}\right\| _{H^1(\Omega )}&\le \sup _{\underline{H}\in \underline{L}^2(S,\Omega )\backslash \{\underline{0}\}}\frac{-\left( \mu ^{\frac{1}{2}}\varepsilon (\varvec{v}), \mu ^{-\frac{1}{2}}\underline{H}\right) }{\left\| \mu ^{-\frac{1}{2}}\underline{H}\right\| _{L^2(\Omega )}}\\&= \sup _{\underline{H}\in \underline{L}^2(S,\Omega ) \backslash \{\underline{0}\}}\frac{A((\underline{w},\varvec{v}),(\underline{H},\varvec{0}))-((2\mu )^{-1}\mathcal {A}\underline{w},\underline{H})}{\left\| \mu ^{-\frac{1}{2}}\underline{H}\right\| _{L^2(\Omega )}}\\&\le \sup _{\underline{H}\in \underline{L}^2(S,\Omega )\backslash \{\underline{0}\} }\frac{A((\underline{w},\varvec{v}),(\underline{H},\varvec{0}))}{\Vert (\underline{H},\varvec{0})\Vert _X}\\&\quad +\sup _{\underline{H}\in \underline{L}^2(S,\Omega )\backslash \{\underline{0}\}}\frac{((2\mu )^{-1}\mathcal {A}\underline{w},\underline{H})}{\left\| \mu ^{-\frac{1}{2}}\underline{H}\right\| _{L^2(\Omega )}}\\&\le \mathbb {M}+ \left\| (2\mu )^{-\frac{1}{2}}\mathcal {A}\underline{w}\right\| _{L^2(\Omega )}. \end{aligned} \end{aligned}$$

Thereby, it follows

$$\begin{aligned} \Vert (\underline{w},\varvec{v})\Vert _X^2=\left\| \mu ^{-\frac{1}{2}}\underline{w}\right\| _{L^2(\Omega )}^2+\left\| \mu ^{\frac{1}{2}}\varvec{v}\right\| _{H^1(\Omega )}^2\le C\Big (\mathbb {M}^2+\mathbb {M}\Vert (\underline{w},\varvec{v})\Vert _X\Big ). \end{aligned}$$

Thus, Young’s inequality yields \( \Vert (\underline{w},\varvec{v})\Vert _X\le C \mathbb {M}. \) Now let \((\underline{H},\varvec{\theta })\in X\) be such that \(A((\underline{w},\varvec{v}),(\underline{H},\varvec{\theta }))=0\) for all \((\underline{w},\varvec{v})\in X\). Taking \((\underline{w},\varvec{v})=(\underline{H},\varvec{\theta })\), we have \( \Vert (2\mu )^{-\frac{1}{2}}\mathcal {A}\underline{H}\Vert _{L^2(\Omega )}=0, \) which yields \(\mathcal {A}\underline{H}=\underline{0}\).

We can infer from the Korn inequality and the Poincaré inequality

$$\begin{aligned} \Vert \varvec{\theta }\Vert _{H^1(\Omega )}\le C \sup _{\underline{w}\in \underline{L}^2(S,\Omega )\backslash \{\underline{0}\} }\frac{(\varepsilon (\varvec{\theta }),\underline{w})}{\Vert \underline{w}\Vert _{L^2(\Omega )}}=C\sup _{\underline{w}\in \in \underline{L}^2(S,\Omega )\backslash \{\underline{0}\}}\frac{A((\underline{w},\varvec{0}),(\underline{H},\varvec{\theta }))}{\Vert \underline{w}\Vert _{L^2(\Omega )}}=0. \end{aligned}$$

Thus, \(\varvec{\theta }=\varvec{0}\). Since \(\textrm{tr}(\underline{H})\in L^2(\Omega )\), there exists \(\varvec{z}\in \varvec{H}^1(\Omega )\) such that \(\nabla \cdot \varvec{z}=\textrm{tr}(\underline{H})\) and \(\Vert \varvec{z}\Vert _{H^1(\Omega )}\le C \Vert \textrm{tr}(\underline{H})\Vert _{L^2(\Omega )} \). Then a simple manipulation shows that

$$\begin{aligned} \Vert \textrm{tr}(\underline{H})\Vert _{L^2(\Omega )}^2=(\textrm{tr}(\underline{H}), \nabla \cdot \varvec{z})=d\Big ((\underline{H},\varepsilon (\varvec{z}))-(\mathcal {A}\underline{H},\varepsilon (\varvec{z}))\Big ). \end{aligned}$$

We let \((\underline{w},\varvec{v})=(\underline{0},\varvec{z})\). Then we have \( (\underline{H},\varepsilon (\varvec{z}))=-A((\underline{0},\varvec{z}),(\underline{H},\varvec{\theta }))=0. \) Thus \(\underline{H}=\underline{0}\) in view of \(\mathcal {A}\underline{H}=\underline{0}\) and \(\textrm{tr}(\underline{H})=0\). Then the well-posedness can be proved by using Banach--Nečas–Babuška theorem (cf. [2]). \(\square \)

We remark that the unique solvability of (2.5)–(2.6) can be proved similarily with additional treatment for the nonlinear term under the smallness assumption on the Helmholtz projector of \(\varvec{f}_N\) (cf. [15, Theorem 6.36] and [28, Equation (3.7)]). We omit the details for simplicity. In the following, we will derive the pressure-robust discretization for the Stokes equations and the Navier–Stokes equations. Let \(\mathcal {T}_h\) represent the shape-regular triangulations of the domain \(\Omega \). For each element \(K\in \mathcal {T}_h\), we let \(h_K\) be the diameter of the element K. In addition, we use \(\mathcal {F}_h\) to represent the union of all the faces and use \(\mathcal {F}_h^0\) to represent the union of all the interior faces. We let \(h_K\) represent the diameter of the element K, \(K\in \mathcal {T}_h\). For each face F, we use \(h_F\) to denote the diameter of F. We use \(\varvec{n}_F\) to represent the unit normal vector of F pointing from \(K_1\) to \(K_2\), where \(K_1\) and \(K_2\) are the elements sharing the common face F. When there is no confusion, we use \(\varvec{n}\) to simplify the notation. For each interior face F, we define the jump and average of a scalar function q over F as

$$\begin{aligned} \llbracket q\rrbracket _{|F}:=q_{|K_1}-q_{|K_2}\quad \text{ and }\quad \{q\}_{|F} :=\frac{q_{|K_1}+q_{|K_2}}{2}, \end{aligned}$$

where \(K_1\) and \(K_2\) are the two elements belonging to \(\mathcal {T}_h\) sharing the common face F. For the boundary faces, we simply define \(\llbracket q\rrbracket _{|F}:=q_{|K_1}\) and \(\{q\}_{|F}:=q_{|K_1}\). We also use the same notation to indicate the jump and average of the vector and tensor functions. Let \(k\ge 1\) represent the polynomial order, we use \(P_k(K)\) and \(P_k(F)\) to represent the polynomial functions defined on K and F whose order is less than or equal to k. Similarily, \(\varvec{P}_k(K)\) represents the vector-valued functions and \(\underline{P}_k(S,K)\) represents the symmetric tensor-valued functions on K. For any scalar functions \(q,\theta \), we let \((q,\theta )_{\mathcal {T}_h}:=\sum _{K\in \mathcal {T}_h} (q,\theta )_K\). For the vector functions \(\varvec{q},\varvec{\theta }\), we let \((\varvec{q},\varvec{\theta })_{\mathcal {T}_h}:=\sum _{i=1}^d(q_i,\theta _i)_{\mathcal {T}_h}\). Similarily, for tensor functions \(\underline{q},\underline{\theta }\), we let \((\underline{q},\underline{\theta })_{\mathcal {T}_h}:=\sum _{i,j=1}^d(\underline{q}_{ij},\underline{\theta }_{ij})_{\mathcal {T}_h}\). Moreover, \(\langle q,\theta \rangle _{\partial \mathcal {T}_h}:=\sum _{K\in \mathcal {T}_h} \langle q,\theta \rangle _{\partial K}\).

For a vector function, we use \(\varvec{v}^n\) and \(\varvec{v}^t\) to represent the normal component and the tangential component, respectively, that is,

$$\begin{aligned} \varvec{v}^n:=(\varvec{v}\cdot \varvec{n})\varvec{n},\quad \varvec{v}^t:=\varvec{v}-\varvec{v}^n. \end{aligned}$$

We introduce the following finite-dimensional spaces

$$\begin{aligned} \varvec{U}_h&:=\{\varvec{v}\in \varvec{P}_{k+1}(K),\forall K\in \mathcal {T}_h, \llbracket \varvec{v}\cdot \varvec{n}\rrbracket _{|F}=0,\forall F\in \mathcal {F}_h^0; \varvec{v}\cdot \varvec{n}=0 \;\text{ on }\;\partial \Omega _D\},\\ \underline{\Sigma }_h&:=\{\underline{w}\in \underline{P}_{k}(S,K),\forall K\in \mathcal {T}_h\},\\ \varvec{\widehat{U}}_h&:=\{\varvec{\mu }\in \varvec{P}_k(F), \varvec{\mu }\cdot \varvec{n}_{|F}=0,\forall F\in \mathcal {F}_h; \varvec{\mu }^t=\varvec{0} \;\text{ on }\;\partial \Omega _D \}. \end{aligned}$$

We remark that if \(\partial \Omega _N=\emptyset \), then we need to enforce the restriction \(\int _\Omega \textrm{tr}(\underline{w})=0\) for \(\underline{\Sigma }_h\) in order to ensure the unique solvability.

We define the discrete \(H^1\)-norm for \((\varvec{v},\widehat{\varvec{v}})\in \varvec{U}_h\times \widehat{\varvec{U}}_h\)

$$\begin{aligned} \Vert (\varvec{v},\widehat{\varvec{v}})\Vert _{1,h}^2:=\Vert \nabla \varvec{v}\Vert _{L^2(\mathcal {T}_h)}^2+\left\| h^{-\frac{1}{2}}(\varvec{v}^t-\varvec{\widehat{v}})\right\| _{L^2(\partial \mathcal {T}_h)}^2, \end{aligned}$$

where \(h_{|F}:=h_F\).

Note that for \((\varvec{v},\widehat{\varvec{v}})\in \varvec{U}_h\times \widehat{\varvec{U}}_h\), it holds

$$\begin{aligned} \begin{aligned} \Vert \varvec{v}\Vert _{h}^2&:= \Vert \nabla \varvec{v}\Vert _{L^2(\mathcal {T}_h)}^2+\left\| h^{-\frac{1}{2}}\llbracket \varvec{v}^t\rrbracket \right\| _{L^2(\partial \mathcal {T}_h\backslash \partial \Omega _N)}^2\\&=\Vert \nabla \varvec{v}\Vert _{L^2(\mathcal {T}_h)}^2+\left\| h^{-\frac{1}{2}}\llbracket \varvec{v}^t-\widehat{\varvec{v}}\rrbracket \right\| _{L^2(\partial \mathcal {T}_h\backslash \partial \Omega _N)}^2\le C \Vert (\varvec{v},\widehat{\varvec{v}})\Vert _{1,h}^2, \end{aligned} \end{aligned}$$
(2.8)

which coupled with the discrete Poincaré inequality (cf. [6]) yields

$$\begin{aligned} \Vert \varvec{v}\Vert _{L^2(\mathcal {T}_h)}\le C\Vert (\varvec{v},\widehat{\varvec{v}})\Vert _{1,h}\quad \forall (\varvec{v},\widehat{\varvec{v}})\in \varvec{U}_h\times \widehat{\varvec{U}}_h. \end{aligned}$$
(2.9)

The discrete formulation for the Stokes equations reads as follows: Find \((\underline{\sigma }_h,\varvec{u}_h,\varvec{\widehat{u}}_h)\in \underline{\Sigma }_h\times \varvec{U}_h\times \varvec{\widehat{U}}_h\) such that

$$\begin{aligned}&((2\mu )^{-1}\mathcal {A}\underline{\sigma }_h,\underline{w})_{\mathcal {T}_h}+(\varvec{u}_h,\textrm{div}\underline{w})_{\mathcal {T}_h}\nonumber \\&\quad -\sum _{F\in \mathcal {F}_h^0\cup \partial \Omega _N}\langle \varvec{u}_h^n,\llbracket (\underline{w}\varvec{n})^n\rrbracket \rangle _{F}-\sum _{F\in \mathcal {F}_h^0\cup \partial \Omega _N}\langle \varvec{\widehat{u}}_h,\llbracket (\underline{w}\varvec{n})^t\rrbracket \rangle _{F}=0, \end{aligned}$$
(2.10)
$$\begin{aligned}&- (\underline{\sigma }_h, \varepsilon (\varvec{v}))_{\mathcal {T}_h}+\langle (\underline{\sigma }_h\varvec{n})^t,\varvec{v}^t\rangle _{\partial \mathcal {T}_h}-\langle \tau (\varvec{P_M} \varvec{u}_h^t -\varvec{\widehat{u}}_h),\varvec{v}^t\rangle _{\partial \mathcal {T}_h}=-(\varvec{f}_S,\varvec{v})_{\mathcal {T}_h},\end{aligned}$$
(2.11)
$$\begin{aligned}&\sum _{F\in \mathcal {F}_h^0\cup \partial \Omega _N}\langle (\underline{\sigma }_h\varvec{n})^t,\varvec{\widehat{v}}\rangle _{F}-\langle \tau (\varvec{P_M} \varvec{u}_h^t-\varvec{\widehat{u}}_h), \varvec{\widehat{v}}\rangle _{\partial \mathcal {T}_h}=0 \end{aligned}$$
(2.12)

for \((\underline{w},\varvec{v},\varvec{\widehat{v}})\in \underline{\Sigma }_h\times \varvec{U}_h\times \varvec{\widehat{U}}_h\). Here we assume \(\tau _{|K} = C h_K^{-1} \mu \) over each element. In addition, \(\varvec{P_M}\) restricted to each face F represents the standard \(L^2\)-orthogonal projection from \(\varvec{L}^2(F)\) onto \(\varvec{P}_k(F), F\subset \partial K\), \(K\in \mathcal {T}_h\).

To ease the later presentation, we recast the discrete formulation into compact form. To this end, we let

$$\begin{aligned}&B_h(\varvec{v},\underline{w}):=(\varvec{v},\textrm{div}\underline{w})_{\mathcal {T}_h}-\sum _{F\in \mathcal {F}_h^0\cup \partial \Omega _N}\langle {\varvec{v}^n},\llbracket (\underline{w}\varvec{n})^n\rrbracket \rangle _{F},\quad \forall (\varvec{v},\underline{w})\in \varvec{U}_h\times \underline{\Sigma }_h,\\&S_h((\varvec{w},\widehat{\varvec{w}}),(\varvec{v},\widehat{\varvec{v}})):=\langle \tau (\varvec{P_M} \varvec{w}^t-\varvec{\widehat{w}}),\varvec{v}^t-\widehat{\varvec{v}}\rangle _{\partial \mathcal {T}_h},\quad \\ {}&\quad \forall (\varvec{w},\widehat{\varvec{w}})\in \varvec{U}_h\times \widehat{\varvec{U}}_h, (\varvec{v},\widehat{\varvec{v}})\in \varvec{U}_h\times \widehat{\varvec{U}}_h,\\&T_h(\widehat{\varvec{v}},\underline{w}):=\sum _{F\in \mathcal {F}_h^0\cup \partial \Omega _N}\langle (\underline{w}\varvec{n})^t,\varvec{\widehat{v}}\rangle _{F},\quad \forall (\underline{w},\varvec{\widehat{v}})\in \underline{\Sigma }_h\times \widehat{\varvec{U}}_h. \end{aligned}$$

Integration by parts implies that

$$\begin{aligned} B_h(\varvec{v},\underline{w})= - (\underline{w}, \varepsilon (\varvec{v}))_{\mathcal {T}_h}+\langle (\underline{w}\varvec{n})^t,\varvec{v}^t\rangle _{\partial \mathcal {T}_h} \quad \forall (\varvec{v},\underline{w})\in \varvec{U}_h\times \underline{\Sigma }_h. \end{aligned}$$

We define \( \mathbb {A}_h((\cdot ,\cdot ,\cdot ),(\cdot ,\cdot ,\cdot ))\) by

$$\begin{aligned}&\mathbb {A}_h((\underline{S},\varvec{w},\widehat{\varvec{w}}),(\underline{H},\varvec{v},\widehat{\varvec{v}}))\\&\quad :=((2\mu )^{-1}\mathcal {A}\underline{S},\underline{H})+B_h(\varvec{w},\underline{H})-B_h(\varvec{v},\underline{S})+S_h((\varvec{w},\widehat{\varvec{w}}),(\varvec{v},\widehat{\varvec{v}}))\\&\qquad \quad -T_h(\widehat{\varvec{w}},\underline{H})+T_h(\widehat{\varvec{v}},\underline{S}). \end{aligned}$$

Then (2.10)–(2.12) can be rewritten in compact form as follows: Find \((\underline{\sigma }_h,\varvec{u}_h,\varvec{\widehat{u}}_h)\in \underline{\Sigma }_h\times \varvec{U}_h\times \varvec{\widehat{U}}_h\) such that

$$\begin{aligned} \mathbb {A}_h((\underline{\sigma }_h,\varvec{u}_h,\widehat{\varvec{u}}_h),(\underline{H},\varvec{v},\widehat{\varvec{v}}))=(\varvec{f}_S,\varvec{v})\quad \forall (\varvec{v},\underline{H},\widehat{\varvec{v}})\in \varvec{U}_h\times \underline{\Sigma }_h\times \widehat{\varvec{U}}_h. \end{aligned}$$

We define the convective trilinear form for \(\varvec{w}\in \varvec{U}_h,(\varvec{\psi },\widehat{\varvec{\psi }})\in \varvec{U}_h\times \widehat{\varvec{U}}_h\) and \((\varvec{v},\widehat{\varvec{v}})\in \varvec{U}_h\times \widehat{\varvec{U}}_h\) as follows

$$\begin{aligned} N_h(\varvec{w};(\varvec{\psi },\widehat{\varvec{\psi }}),(\varvec{v},\widehat{\varvec{v}}))&:=-\sum _{K\in \mathcal {T}_h}(\varvec{\psi }\otimes \varvec{w}, \nabla \varvec{v})_K +\frac{1}{2}\left\langle \varvec{w}\cdot \textbf{n},(\varvec{\psi }^t+\widehat{\varvec{\psi }})\cdot (\varvec{v}^t-\widehat{\varvec{v}})\right\rangle _{\partial \mathcal {T}_h\backslash \partial \Omega _D}\\&\quad \;+\frac{1}{2}\left\langle |\varvec{w}\cdot \textbf{n}|,(\varvec{\psi }^t-\widehat{\varvec{\psi }})\cdot (\varvec{v}^t-\widehat{\varvec{v}})\right\rangle _{\partial \mathcal {T}_h\backslash \partial \Omega _D}. \end{aligned}$$

The discrete formulation for the Navier–Stokes equations reads as follows: Find \((\underline{\sigma }_h,\varvec{u}_h,\widehat{\varvec{u}}_h)\in \varvec{U}_h\times \underline{\Sigma }_h\times \widehat{\varvec{U}}_h\) such that

$$\begin{aligned}&\mathbb {A}_h((\underline{\sigma }_h,\varvec{u}_h,\widehat{\varvec{u}}_h),(\underline{H},\varvec{v},\widehat{\varvec{v}}))+N_h(\varvec{u}_h;(\varvec{u}_h,\widehat{\varvec{u}}_h),(\varvec{v},\widehat{\varvec{v}}))\nonumber \\&\quad =(\varvec{f}_N,\varvec{v})\quad \forall (\varvec{v},\underline{H},\widehat{\varvec{v}})\in \varvec{U}_h\nonumber \\&\qquad \times \underline{\Sigma }_h\times \widehat{\varvec{U}}_h. \end{aligned}$$
(2.13)

Remark 2.1

(Divergence-free velocity). Although our discrete formulation does not involve the divergence-free restriction for velocity explicitly, we can show that the numerical velocity is actually divergence-free; indeed, this is encapsulated in (2.10). Let \(P_h:=\{q_{|K}\in P_k(K),\forall K\in \mathcal {T}_h\}\). For an arbitrary function \(q\in P_h\), we set \(\underline{w}=q\underline{I}\) in (2.10), then we have from integration by parts that

$$\begin{aligned} \sum _{K\in \mathcal {T}_h}(\nabla \cdot \varvec{u}_h, q)_K=0. \end{aligned}$$
(2.14)

Remark 2.2

We use discontinuous polynomial space of order k and \(H(\textrm{div};\Omega )\)-conforming space of order \(k+1\) to approximate the stress and velocity, respectively. This choice in conjunction with the use of \(\varvec{P_M}\) can guarantee the optimal convergence error estimates for all the variables as well as the robustness of the error estimates with respect to \(\mu \). In constrast, the scheme proposed in [21, 22] uses the equal polynomial order k for the approximation of stress and velocity, and the stress converges in \(\mathcal {O}(h^k)\) in \(L^2\)-norm. Compared to the scheme presented in [11, 35], our scheme computes the physical variables of interest directly, which is important in some applications.

Remark 2.3

In contrast to the divergence-conforming schemes in velocity gradient-velocity-pressure formulation [12] and velocity-pressure formulation [27], we use the stress-velocity formulation with strongly symmetric stress and eliminate the pressure via the incompressibility condition, which enables us to calculate the physical interest directly without resorting to postprocessing. The proposed scheme also provides a unified framework for solving the Stokes equations and the elasticity problem. As such, our scheme can be easily extended to solve multiphysical problems such as the fluid-structure interaction problem with a natural incorporation of the interface conditions. Moreover, owing to the use of stress-velocity formulation, the proof for the pressure-independent error estimates is not trivial, the developed methodologies can provide new perspectives for other pressure-robust discretiations.

2.2 Main results

In this subsection we state the main results and the proof is given in Sects. 4 and 5.

Theorem 2.2

(Discrete \(H^1\) stability of the Stokes equations) There exists a unique solution to (2.10)–(2.12). In addition, the following estimates hold

$$\begin{aligned} \Vert (\varvec{u}_h,\widehat{\varvec{u}}_h)\Vert _{1,h}\le C \Vert \mu ^{-1}\mathcal {A}\underline{\sigma }_h\Vert _{L^2(\mathcal {T}_h)}+\mu ^{-\frac{1}{2}}\Vert \tau ^{\frac{1}{2}}(\varvec{P_M}\varvec{u}_h^t-\varvec{\widehat{u}}_h)\Vert _{L^2(\partial \mathcal {T}_h)} \end{aligned}$$
(2.15)

and

$$\begin{aligned} \mu \Vert (\varvec{u}_h,\widehat{\varvec{u}}_h)\Vert _{1,h}&\le C \Vert {\mathbb {P}(\varvec{f}_S)}\Vert _{L^2(\mathcal {T}_h)},\\ \Vert \underline{\sigma }_h\Vert _{L^2(\mathcal {T}_h)}&\le C \Vert \varvec{f}_S\Vert _{L^2(\mathcal {T}_h)}. \end{aligned}$$

Let \( \underline{e_\sigma } =\underline{\Pi _\Sigma } \underline{\sigma }-\underline{\sigma }_h,\; \varvec{e_u} = \varvec{\Pi _U}\varvec{u}-\varvec{u}_h,\; \varvec{e_{\widehat{u}}}=\varvec{P_M}\varvec{u}^t-\varvec{\widehat{u}}_h \), where \(\underline{\Pi _\Sigma }\) and \( \varvec{\Pi _U}\) are projection operators defined in Sect. 4. Then the following convergence error estimates hold.

Theorem 2.3

(Error estimates of the Stokes equations) Let \((\underline{\sigma },\varvec{u})\) be the solution of (2.1)–(2.2) and let \((\underline{\sigma }_h,\varvec{u}_h,\varvec{\widehat{u}}_h)\in \underline{\Sigma }_h\times \varvec{U}_h\times \varvec{\widehat{U}}_h\) be the discrete solution of (2.10)–(2.12). In addition, we assume that \((\underline{\sigma },\varvec{u})\in \underline{H}^{t}(\Omega )\times \varvec{H}^s(\Omega ), \frac{1}{2}< t\le k+1, \frac{3}{2}< s\le k+2\), the following error estimate holds

$$\begin{aligned} \begin{aligned}&\left\| (2\mu )^{-\frac{1}{2}}\mathcal {A}\underline{e_\sigma }\right\| _{L^2(\mathcal {T}_h)}+\left\| \tau ^{\frac{1}{2}}(\varvec{P_M}\varvec{e_u}^t-\varvec{e_{\widehat{u}}})\right\| _{L^2(\partial \mathcal {T}_h)}\\&\quad \le C \Big (h^t \mu ^{-\frac{1}{2}} |\mathcal {A}\underline{\sigma }|_{H^t(\Omega )}+\mu ^{\frac{1}{2}}h^{s-1}|\varvec{u}|_{H^s(\Omega )}\Big )\\&\quad \le C \mu ^{\frac{1}{2}}h^{s-1}|\varvec{u}|_{H^s(\Omega )}{.} \end{aligned} \end{aligned}$$
(2.16)

In addition, it also holds

$$\begin{aligned} \Vert \varepsilon (\varvec{e_u})\Vert _{L^2(\mathcal {T}_h)}&\le C h^{s-1}|\varvec{u}|_{H^s(\Omega )},\quad \Vert (\varvec{e_u},\varvec{e_{\widehat{u}}})\Vert _{1,h}\le C h^{s-1}|\varvec{u}|_{H^s(\Omega )}. \end{aligned}$$
(2.17)

We remark that the polynomial degree of the tangential trace of velocity is one order lower than that of the approximated velocity, and the globally coupled system only involves the normal trace and tangential trace of velocity, and the piecewise constant approximation for the trace of the stress as shown in next section. Therefore, with regard to the degrees of freedom of the globally coupled unknowns, (2.17) indicates that superconvergence is obtained for the discrete \(H^1\)-norm of velocity.

Theorem 2.4

(\(L^2\)-error for stress) Under the assumptions of Theorem 2.3, the following error estimate holds for the Stokes equations

$$\begin{aligned} \Vert \underline{e_\sigma }\Vert _{L^2(\mathcal {T}_h)}\le C \Big (h^t|\mathcal {A}\underline{\sigma }|_{H^t(\Omega )}+\mu h^{s-1}|\varvec{u}|_{H^s(\Omega )}\Big ). \end{aligned}$$

Theorem 2.5

(\(L^2\)-error for velocity) Let \((\underline{\sigma },\varvec{u})\) be the solution of (2.1)–(2.2) and let \((\underline{\sigma }_h,\varvec{u}_h,\varvec{\widehat{u}}_h)\in \underline{\Sigma }_h\times \varvec{U}_h\times \varvec{\widehat{U}}_h\) be the discrete solution of (2.10)–(2.12). In addition, we assume that \((\underline{\sigma },\varvec{u})\in \underline{H}^{s-1}(\Omega )\times \varvec{H}^s(\Omega ), \frac{3}{2}< s\le k+2\). Then the following error estimate holds

$$\begin{aligned} \Vert \varvec{u}-\varvec{u}_h\Vert _{L^2(\Omega )}\le C h^s|\varvec{u}|_{H^s(\Omega )}. \end{aligned}$$

Remark 2.4

We can observe from Theorems 2.3 and  2.5 that the convergence error estimates for velocity are independent of \(\mu \), which illustrates the robustness of the scheme with respect to the viscosity. The robust error estimates for velocity-pressure formulation are generally easier to obtain. However, the standard error estimates for stress-velocity formulation will lead to the dependence of the error estimates on \(\underline{\sigma }\), which is linked to the pressure variable (cf. (1.1)). To illustrate the independence of the error estimates on \(\mu \) and p, the key tools are to use (4.1) and observe that \((\mathcal {A} \underline{\sigma }\varvec{n})^t_{|F}= (\underline{\sigma }\varvec{n})^t_{|F}\) for \(F\in \mathcal {F}_h\) owing to the fact that \((\textrm{tr}(\underline{\sigma })\underline{I}\varvec{n})^t_{|F}=\varvec{0}\).

Theorem 2.6

(convergence to weak solution for the Navier–Stokes equations) Let \(\{(\underline{\sigma }_h,\varvec{u}_h)\}_{h>0}\) be the sequence of the approximated solutions generated by solving the discrete formulation (2.13). Then, as \(h\rightarrow 0\), it holds

$$\begin{aligned} \varvec{u}_h&\rightarrow \varvec{u} \quad \text {in}\;\varvec{L}^2(\Omega ),\\ \underline{\sigma }_h&\rightarrow \underline{\sigma } \quad \text {in}\;\underline{L}^2(S,\Omega ),\\ \Vert \varvec{u}-\varvec{u}_h\Vert _h&\rightarrow 0, \end{aligned}$$

where \(\Vert \cdot \Vert _{h}\) is defined in (2.8) and \((\underline{\sigma },\varvec{u} )\in X\) is the unique solution to the weak formulation (2.5)–(2.6).

3 A characterization of the proposed HDG scheme

In this section, we first describe the local solvers and the corresponding global solvers for the proposed HDG method, where the global solvers involve the normal trace and the tangential trace of velocity, and the piecewise constant approximation for the trace of the stress.

We can follow [12, 19, 29] to relax the \(H(\textrm{div})\)-conformity of the velocity field via Lagrange multipliers. To begin, we derive the following formulation with relaxed \(H(\textrm{div})\)-conformity: Find \((\underline{\sigma }_h,\varvec{u}_h,{\delta _h},\varvec{\widehat{u}}_h,\lambda _h)\in \underline{\Sigma }_h\times \varvec{U}_h^*\times U_h^n\times \varvec{\widehat{U}}_h\times M_h^\partial \) such that

$$\begin{aligned}&((2\mu )^{-1}\mathcal {A}\underline{\sigma }_h,\underline{w})_{\mathcal {T}_h}+(\varvec{u}_h,\textrm{div}\underline{w})_{\mathcal {T}_h}-\sum _{F\in \mathcal {F}_h}\langle {\delta _h}\varvec{n},\llbracket (\underline{w}\varvec{n})^n\rrbracket \rangle _{F}\\&\quad -\sum _{F\in \mathcal {F}_h\backslash \partial \Omega _D}\langle \varvec{\widehat{u}}_h,\llbracket (\underline{w}\varvec{n})^t\rrbracket \rangle _{F}=0,\\&\quad - (\underline{\sigma }_h, \varepsilon (\varvec{v}))_{\mathcal {T}_h}+\langle \underline{\sigma }_h\varvec{n}+\lambda _h\varvec{n},\varvec{v}\rangle _{\partial \mathcal {T}_h}-\langle \tau (\varvec{P_M} \varvec{u}_h^t -\varvec{\widehat{u}}_h),\varvec{v}^t\rangle _{\partial \mathcal {T}_h}=-(\varvec{f}_S,\varvec{v})_{\mathcal {T}_h},\\&\langle (\underline{\sigma }_h\varvec{n})^t,\varvec{\widehat{v}}\rangle _{\partial \mathcal {T}_h\backslash \partial \Omega _D}-\langle \tau (\varvec{P_M} \varvec{u}_h^t-\varvec{\widehat{u}}_h), \varvec{\widehat{v}}\rangle _{\partial \mathcal {T}_h}=0,\\&\langle (\underline{\sigma }_h\varvec{n})^n+\lambda _h\varvec{n}, \chi \varvec{n}\rangle _{\partial \mathcal {T}_h\backslash \partial \Omega _D} =0,\\&\langle \varvec{u}_h^n-{\delta _h}\varvec{n},\mu \varvec{n}\rangle _{\partial \mathcal {T}_h\backslash \partial \Omega _N} =0 \end{aligned}$$

for \((\underline{w},\varvec{v},\chi ,\varvec{\widehat{v}},\mu )\in \underline{\Sigma }_h\times \varvec{U}_h^*\times U_h^n\times \varvec{\widehat{U}}_h\times M_h^\partial \). Here the spaces are defined as

$$\begin{aligned} \varvec{U}_h^*&:=\{\varvec{v}\in \varvec{P}_{k+1}(K),\forall K\in \mathcal {T}_h\},\\ M_h^{\partial }&:=\{\mu \in L^2(\partial \mathcal {T}_h): \mu _{|F}\in {P_{k+1}(F)}, F\subset \partial K,\forall K\in \mathcal {T}_h;\mu _{|F}=0,\forall F\in \partial \Omega _N\},\\ U_h^n&:= \{\chi \in P_{k+1}(F), \forall F\in \mathcal {F}_h; \chi =0 \;\text{ on }\;\partial \Omega _D \}. \end{aligned}$$

Then we can follow [19] to define the local solvers, and the resulting global system only involves the interface variables.

For \(\underline{\sigma }_h\in \underline{\Sigma }_h\), let us define \((\bar{\sigma }_0)_{|K}:=\frac{1}{d{ |K|}}\int _K \textrm{tr}(\underline{\sigma })\). Then we let

$$\begin{aligned} \bar{\underline{\sigma }}_h:=\underline{\sigma }_h-\bar{\sigma }_0\underline{I}. \end{aligned}$$

It follows that \(\int _K \textrm{tr}( \bar{\underline{\sigma }}_h)=0\). We define the following local spaces for \(K\in \mathcal {T}_h\):

$$\begin{aligned} \underline{\Sigma }(K)&:=\{\underline{w}\in \underline{P}_{k}(S,K); {\int _K \textrm{tr}(\underline{w})=0}\}, \\ \varvec{U}(K)&:=\{\varvec{v}\in \varvec{P}_{k+1}(K)\},\; M_h^{\partial }(K):=\{ \mu _{|\partial K}\in {P_{k+1}}(\partial K)\}. \end{aligned}$$

Given \(\delta _h\in U_h^n, \widehat{\varvec{u}}_h\in \widehat{\varvec{U}}_h\) and \(\varvec{f}_S\in L^2(\Omega )\), the local solvers are defined as: Find \((\bar{\underline{\sigma }}_h,\varvec{u}_h,\lambda _h)\in \underline{\Sigma }(K)\times \varvec{U}(K)\times M_h^{\partial }(K)\) such that

$$\begin{aligned}&((2\mu )^{-1}\mathcal {A}{\bar{\underline{\sigma }}_h},\underline{w})_{K}+(\varvec{u}_h,\textrm{div}\underline{w})_{K}=\langle {\delta _h}\varvec{n},(\underline{w}\varvec{n})^n\rangle _{\partial K}+\langle \varvec{\widehat{u}}_h,(\underline{w}\varvec{n})^t\rangle _{\partial K\backslash \partial \Omega _D}, \end{aligned}$$
(3.1)
$$\begin{aligned}&(\textrm{div}{\bar{\underline{\sigma }}_h}, \varvec{v})_{K}+\langle \lambda _h,\varvec{v}\cdot \varvec{n}\rangle _{\partial K\backslash \partial \Omega _N}-\langle \tau \varvec{P_M} \varvec{u}_h^t,\varvec{v}^t\rangle _{\partial K}=-(\varvec{f}_S,\varvec{v})_{K}-\langle \tau \varvec{\widehat{u}}_h,\varvec{v}^t\rangle _{\partial K},\end{aligned}$$
(3.2)
$$\begin{aligned}&\langle \varvec{u}_h^n-{\delta _h}\varvec{n},\mu \varvec{n}\rangle _{\partial K\backslash \partial \Omega _N} =0 \end{aligned}$$
(3.3)

for \((\underline{w},\varvec{v},\mu )\in \underline{\Sigma }(K)\times \varvec{U}(K)\times M_h^{\partial }(K)\).

When we set \(\varvec{f}_S=\varvec{0}\) in (3.1)–(3.3), the solution to the local problem can be denoted by \((\underline{\sigma }_h^{(\delta _h,\varvec{\widehat{u}}_h)}, \varvec{u}_h^{(\delta _h,\varvec{\widehat{u}}_h)},\lambda _h^{(\delta _h,\varvec{\widehat{u}}_h)})\), where the superscript indicates the dependence of the solution on \((\delta _h,\varvec{\widehat{u}}_h)\). Similarily, when we set \(\delta _h=0,\varvec{\widehat{u}}_h=\varvec{0}\), the solution to the local problem is denoted as \((\underline{\sigma }_h^{\varvec{f}},\varvec{u}_h^{\varvec{f}},\lambda _h^{\varvec{f}})\), where the superscript indicates the dependence of the solution on \(\varvec{f}_S\). Then the solutions to (3.1)–(3.3) can be written as \((\underline{\sigma }_h^{(\delta _h,\varvec{\widehat{u}}_h)}, \varvec{u}_h^{(\delta _h,\varvec{\widehat{u}}_h)},\lambda _h^{(\delta _h,\varvec{\widehat{u}}_h)})+(\underline{\sigma }_h^{\varvec{f}},\varvec{u}_h^{\varvec{f}},\lambda _h^{\varvec{f}})\), we use \((\widetilde{\underline{\sigma }}_h,\widetilde{\varvec{u}}_h,\lambda _h)_K\) to represent the restriction of \((\widetilde{\underline{\sigma }}_h,\widetilde{\varvec{u}}_h,\lambda _h)\) to each element K, \(K\in \mathcal {T}_h\), that is,

$$\begin{aligned} (\widetilde{\underline{\sigma }}_h,\widetilde{\varvec{u}}_h,\lambda _h)_K:=(\underline{\sigma }_h^{(\delta _h,\varvec{\widehat{u}}_h)}, \varvec{u}_h^{(\delta _h,\varvec{\widehat{u}}_h)},\lambda _h^{(\delta _h,\varvec{\widehat{u}}_h)})+(\underline{\sigma }_h^{\varvec{f}},\varvec{u}_h^{\varvec{f}},\lambda _h^{\varvec{f}}). \end{aligned}$$

Let us define \(\bar{P}_h:=\{q_0\in L^2(\Omega ): (q_0)_{|K}\in P_0(K), \forall K\in \mathcal {T}_h;\int _\Omega q_0=0\}\). The global problem is to find \(({\delta _h},\varvec{\widehat{u}}_h,\bar{\sigma }_0)\in U_h^n\times \varvec{\widehat{U}}_h\times \bar{P}_h\) such that

$$\begin{aligned} \langle&((\underline{\sigma }_{h}^{(\delta _h,\varvec{\widehat{u}}_h)}+{\bar{\sigma }_0 \underline{I}})\varvec{n})^n+\lambda _h^{(\delta _h,\varvec{\widehat{u}}_h)}\varvec{n}, \chi \varvec{n}\rangle _{\partial \mathcal {T}_h}= -\langle (\underline{\sigma }_{h}^{\varvec{f}}\varvec{n})^n+\lambda _h^{\varvec{f}}\varvec{n}, \chi \varvec{n}\rangle _{\partial \mathcal {T}_h}, \end{aligned}$$
(3.4)
$$\begin{aligned}&\langle (\underline{\sigma }_{h}^{(u_h^n,\varvec{\widehat{u}}_h)}\varvec{n})^t,\varvec{\widehat{v}}\rangle _{\partial \mathcal {T}_h\backslash \partial \Omega _D}-\langle \tau (\varvec{P_M} (\varvec{u}_{h}^{{(u_h^n,\varvec{\widehat{u}}_h)}})^t-\varvec{\widehat{u}}_h), \varvec{\widehat{v}}\rangle _{\partial \mathcal {T}_h}=-\langle (\underline{\sigma }_{h}^{\varvec{f}}\varvec{n})^t,\varvec{\widehat{v}}\rangle _{\partial \mathcal {T}_h\backslash \partial \Omega _D}, \end{aligned}$$
(3.5)
$$\begin{aligned}&{\langle \delta _h\varvec{n}, (\bar{q}_0\underline{I}\varvec{n})^n\rangle _{\partial \mathcal {T}_h}}=0. \end{aligned}$$
(3.6)

for \((\chi ,\varvec{\widehat{v}},{\bar{q}_0})\in U_h^n\times \varvec{\widehat{U}}_h\times {\bar{P}_h}\).

Note that \((\widetilde{\underline{\sigma }}_h{+\bar{\sigma }_0 \underline{I}},\widetilde{\varvec{u}}_h)\) is the solution to the original problem (2.10)–(2.12).

Remark 3.1

We observe that the globally coupled unknows involve the normal trace and tangential trace of velocity, and the piecewise constant approximation \(\bar{\underline{\sigma }}_0\). Thus, the matrix associated with the resulting global system has a saddle-point structure. One can combine the techniques introduced in [32] to improve the computational efficiency of the resulting saddle-point system. Meanwhile, the numerical velocity is exactly divergence-free, and thus it will naturally yield error estimates for velocity independent of the pressure variable and of the viscosity, which makes it possible to solve incompressible flows with high Reynolds number (cf. [23]).

4 Error analysis for the Stokes equations

In this section, we prove Theorems 2.22.5 on the discrete \(H^1\) stability and the convergence error estimates for all the involved variables.

Let \(\underline{\Pi _\Sigma }\) represent the \(L^2\)-orthogonal projection onto \(\underline{\Sigma }_h\). The \(L^2\) projection property of \(\underline{\Pi _\Sigma }\) implies that

$$\begin{aligned} \mathcal {A} (\underline{\Pi _\Sigma }\underline{w}) = \underline{\Pi _\Sigma }(\mathcal {A}\underline{w})\quad \forall \underline{w}\in \underline{L}^2(\Omega ). \end{aligned}$$
(4.1)

In addition, the following standard convergence error estimates hold for any \(\underline{w}\in \underline{H}^{k+1}(K),K\in \mathcal {T}_h\)

$$\begin{aligned} \Vert \underline{w}-\underline{\Pi _\Sigma }\underline{w}\Vert _{L^2(K)}&\le C h_K^{k+1}|\underline{w}|_{H^{k+1}(K)}, \end{aligned}$$
(4.2)
$$\begin{aligned} \Vert \nabla (\underline{w}-\underline{\Pi _\Sigma }\underline{w})\Vert _{L^2(K)}&\le C h_K^{k}|\underline{w}|_{H^{k+1}(K)}. \end{aligned}$$
(4.3)

Let \(\varvec{\Pi _U}: \varvec{H}(\textrm{div};\Omega )\cap \varvec{L}^p(\Omega )\rightarrow \varvec{U}_h, p>2\) represent the Brezzi–Douglas–Marini (BDM) projection defined by (cf. [5])

$$\begin{aligned} \langle (\varvec{v}-\varvec{\Pi _U}\varvec{v})\cdot \varvec{n}, p_{k+1}\rangle _F&=0{} & {} \quad \forall p_{k+1}\in P_{k+1}(F), F\in \mathcal {F}_h,\\ (\varvec{v}-\varvec{\Pi _U} \varvec{v}, \nabla p_{k})_K&=0{} & {} \quad \forall p_{k}\in P_{k}(K), K\in \mathcal {T}_h,\\ {(\varvec{v}-\varvec{\Pi _U} \varvec{v}, \varvec{b})_K}&=0{} & {} \quad \forall \varvec{b}\in \varvec{B}_{k+1}(K), K\in \mathcal {T}_h, \end{aligned}$$

where \(\varvec{B}_{k+1}(K)\) is the set of polynomials in \(\varvec{P}_{k+1}(K)\) that are divergence-free and whose normal component is zero on \(\partial K\).

In addition, the following error estimates hold for \(\varvec{v}\in \varvec{H}^{k+2}(K), K\in \mathcal {T}_h\) (cf. [5])

$$\begin{aligned} \Vert \varvec{v}-\varvec{\Pi _{U}}\varvec{v}\Vert _{L^2(K)}&\le C h_K^{k+2}| \varvec{v}|_{H^{k+2}(K)}, \end{aligned}$$
(4.4)
$$\begin{aligned} \Vert \nabla (\varvec{v}-\varvec{\Pi _U}\varvec{v})\Vert _{L^2(K)}&\le C h_K^{k+1}| \varvec{v}|_{H^{k+2}(K)}. \end{aligned}$$
(4.5)

We also recall the following trace inequality (cf. [1])

$$\begin{aligned} \Vert q\Vert _{L^2(F)}\le C \Big (h_K^{-\frac{1}{2}}\Vert q\Vert _{L^2(K)}+h_K^{\frac{1}{2}}\Vert \nabla q\Vert _{L^2(K)}\Big )\quad \forall q\in H^1(K),K\in \mathcal {T}_h, F\subset \partial K \end{aligned}$$
(4.6)

and the following discrete trace inequality

$$\begin{aligned} \Vert \varvec{v}\Vert _{L^p(F)}\le C h_K^{-1/p}\Vert \varvec{v}\Vert _{L^p(K)} \quad 1\le p\le \infty ,\;\forall \varvec{v}\in \varvec{U}_h. \end{aligned}$$
(4.7)

For later analysis, we introduce the space of the rigid body motions

$$\begin{aligned} \varvec{\varTheta }_h:=\{\varvec{\Lambda } \in \varvec{L}^2(\Omega ), \varvec{\Lambda }_{|K}=\underline{B}_{K} \varvec{x}+\varvec{b}_k, \underline{B}_K\in \mathbb {B}, \varvec{b}_K\in \mathbb {R}^d, K\in \mathcal {T}_h\}, \end{aligned}$$

where \(\mathbb {B}\) represents the set of all anti-symmetric matrices in \(\mathbb {R}^{d\times d}\). More precisely, \(\mathbb {B}\) in \(\mathbb {R}^2\) and \(\mathbb {R}^3\) can be respectively represented as

$$\begin{aligned} \begin{pmatrix} 0 &{} -s\\ s &{} 0 \end{pmatrix},\quad \begin{pmatrix} 0 &{} s_3&{} -s_2\\ -s_3 &{} 0 &{} s_1\\ s_2 &{} -s_1 &{} 0 \end{pmatrix} \end{aligned}$$

with constants \(s,s_i\in \mathbb {R},i=1,2,3\).

Following [31], we have the following lemma.

Lemma 4.1

Let \(K\in \mathcal {T}_h\) with meshsize \(h_K\) and \(\varvec{\varTheta }(K):=(\varvec{\varTheta }_h)_{|K}\). Then for any function \(\varvec{v}\in \varvec{U}_K\), where \(\varvec{U}_K:=(\varvec{U}_h)_{|K}\), we have

$$\begin{aligned} \inf _{\varvec{\Lambda }\in \varvec{\varTheta }(K)}\Vert \nabla (\varvec{v}+\varvec{\Lambda })\Vert _{L^2(K)}\le C \Vert \varepsilon (\varvec{v})\Vert _{L^2(K)}. \end{aligned}$$

Now we can prove Theorem 2.2, which demonstrates the stability and unique solvability of the discrete formulation (2.10)–(2.12).

Proof

We first show the stability, then the uniqueness follows by setting \(\varvec{f}_S=\varvec{0}\). As (2.10)–(2.12) is a square linear system, existence follows from uniqueness.

Taking \(\underline{w}=\underline{\sigma }_h\), \(\varvec{v}=\varvec{u}_h\) and \(\varvec{\widehat{v}}=\varvec{\widehat{u}}_h\) in (2.10)–(2.12) and summing up the resulting equations, it follows from (2.9), (2.7) and (2.14)

$$\begin{aligned}&(2\mu )^{-1}\Vert \mathcal {A}\underline{\sigma }_h\Vert _{L^2(\Omega )}^2+\left\| \tau ^{\frac{1}{2}}(\varvec{P_M}\varvec{u}_h^t-\varvec{\widehat{u}}_h)\right\| _{L^2(\partial \mathcal {T}_h)}^2\nonumber \\&\quad =(\varvec{f}_S,\varvec{u}_h)=(\mathbb {P}(\varvec{f}_S),\varvec{u}_h)\le C {\Vert \mathbb {P}(\varvec{f}_S)\Vert _{L^2(\mathcal {T}_h)}\Vert (\varvec{u}_h,\widehat{\varvec{u}}_h)\Vert _{1,h}}. \end{aligned}$$
(4.8)

In the following, we prove (2.15). We can infer from (2.10) and integration by parts that

$$\begin{aligned}&((2\mu )^{-1}\mathcal {A}\underline{\sigma }_h,\underline{w})_{\mathcal {T}_h}-(\varepsilon (\varvec{u}_h), \underline{w})_{\mathcal {T}_h}+\langle \varvec{u}_h^t,(\underline{w}\varvec{n})^t\rangle _{\partial \mathcal {T}_h}\\ {}&\quad -\langle \varvec{\widehat{u}}_h,(\underline{w}\varvec{n})^t\rangle _{\partial \mathcal {T}_h\backslash \partial \Omega _D}=0\quad \forall \underline{w}\in \underline{\Sigma }_h, \end{aligned}$$

thus,

$$\begin{aligned} (\varepsilon (\varvec{u}_h), \underline{w})_{\mathcal {T}_h}= ((2\mu )^{-1}\mathcal {A}\underline{\sigma }_h,\underline{w})_{\mathcal {T}_h}+\langle \varvec{P_M}\varvec{u}_h^t- \varvec{\widehat{u}}_h,(\underline{w}\varvec{n})^t\rangle _{\partial \mathcal {T}_h}. \end{aligned}$$

Then setting \(\underline{w}=\varepsilon _h(\varvec{u}_h)\), it follows from (4.7) that

$$\begin{aligned} \Vert \varepsilon (\varvec{u}_h)\Vert _{L^2(\mathcal {T}_h)}\le C \Big (\Vert \mu ^{-1}\mathcal {A}\underline{\sigma }_h\Vert _{L^2(\mathcal {T}_h)}+\Vert h^{-1/2}(\varvec{P_M}\varvec{u}_h^t-\varvec{\widehat{u}}_h)\Vert _{L^2(\partial \mathcal {T}_h)}\Big ). \end{aligned}$$
(4.9)

An appeal to (4.7) and Lemma 4.1 leads to

$$\begin{aligned} \Vert \varvec{u}_h-\varvec{P}_M\varvec{u}_h\Vert _{L^2(F)}&= \Vert \varvec{u}_h+(\underline{B}_k\varvec{x}+\varvec{b}_k)-\varvec{P_M}(\varvec{u}_h+(\underline{B}_k\varvec{x}+\varvec{b}_k))\Vert _{L^2(F)}\\&\le C h_K\Vert \nabla (\varvec{u}_h+\underline{B}_k\varvec{x}+\varvec{b}_k)\Vert _{L^2(\partial K)}\\&\le C h_K^{\frac{1}{2}}\Vert \nabla (\varvec{u}_h+\underline{B}_k\varvec{x}+\varvec{b}_k)\Vert _{L^2(K)}\\&\le Ch_K^{\frac{1}{2}} \Vert \varepsilon (\varvec{u}_h)\Vert _{L^2(K)},\quad \forall F\subset \partial K, K\in \mathcal {T}_h, \end{aligned}$$

where we use the fact that for \(k\ge 1\), the projection applied to the rigid body motion is the identity.

As a result, we have

$$\begin{aligned} \begin{aligned}&\Vert h^{-1/2}(\varvec{u}_h^t-\varvec{\widehat{u}}_h)\Vert _{L^2(\partial \mathcal {T}_h)}\\&\quad \le \Vert h^{-1/2}(\varvec{P_M}\varvec{u}_h-\varvec{u}_h)\Vert _{L^2(\partial \mathcal {T}_h)}+\Vert h^{-1/2}(\varvec{P_M}\varvec{u}_h^t-\varvec{\widehat{u}}_h)\Vert _{L^2(\partial \mathcal {T}_h)}\\&\quad \le C \Big (\Vert \varepsilon (\varvec{u}_h)\Vert _{L^2(\mathcal {T}_h)}+\Vert h^{-1/2}(\varvec{P_M}\varvec{u}_h^t-\varvec{\widehat{u}}_h)\Vert _{L^2(\partial \mathcal {T}_h)}\Big ). \end{aligned} \end{aligned}$$
(4.10)

Let \(\varvec{u}_h^*\) represent the \(H^1\)-conforming counterpart of \({\varvec{u}_h}\) that can be defined using the averaging operator introduced in [24] and \(\varvec{u}_h^*=\varvec{0}\) on \(\partial \Omega _D\); indeed, \(\varvec{u}_h^*\) at the interior Lagrangian nodes can be defined by taking the average of \(\varvec{u}_h\) at the nodes [24]. Then it holds

$$\begin{aligned} \Vert \nabla (\varvec{u}_h-\varvec{u}_h^*)\Vert _{L^2(\mathcal {T}_h)}\le C \Vert h^{-1/2}\llbracket \varvec{u}_h\rrbracket \Vert _{L^2(\mathcal {F}_h\backslash \partial \Omega _N)}. \end{aligned}$$

The triangle inequality and the Korn inequality yield

$$\begin{aligned} \begin{aligned}&\Vert (\varvec{u}_h,\varvec{\widehat{u}}_h)\Vert _{1,h}\\&\quad \le C\Big ( \Vert \nabla (\varvec{u}_h-\varvec{u}_h^*)\Vert _{L^2(\mathcal {T}_h)}+\Vert \nabla \varvec{u}_h^*\Vert _{L^2(\mathcal {T}_h)}+\Vert h^{-1/2}(\varvec{u}_h^t-\widehat{\varvec{u}}_h)\Vert _{L^2(\partial \mathcal {T}_h)}\Big )\\&\quad \le C\Big (\Vert h^{-1/2}\llbracket \varvec{u}_h\rrbracket \Vert _{L^2(\mathcal {F}_h\backslash \partial \Omega _N)}+\Vert \varepsilon (\varvec{u}_h^*)\Vert _{L^2(\mathcal {T}_h)}+\Vert h^{-1/2}(\varvec{u}_h^t-\widehat{\varvec{u}}_h)\Vert _{L^2(\partial \mathcal {T}_h)}\Big )\\&\quad \le C \Big (\Vert \varepsilon (\varvec{u}_h)\Vert _{L^2(\mathcal {T}_h)}+\Vert h^{-1/2}(\varvec{P_M}\varvec{u}_h^t-\varvec{\widehat{u}}_h)\Vert _{L^2(\partial \mathcal {T}_h)}\Big ), \end{aligned}\nonumber \\ \end{aligned}$$
(4.11)

where we use the fact that \(\varvec{u}_h\) is normal continuous, and thus the jump of \(\varvec{u}_h\) only involves the tangential component. (4.11) coupled with (4.9) implies (2.15).

Then it follows from (2.15) and (4.8) that

$$\begin{aligned} \mu ^{-\frac{1}{2}}\Vert \mathcal {A}\underline{\sigma }_h\Vert _{L^2(\Omega )}+\Vert \tau ^{\frac{1}{2}}(\varvec{P_M}\varvec{u}_h^t-\varvec{\widehat{u}}_h)\Vert _{L^2(\partial \mathcal {T}_h)}&\le C\mu ^{-\frac{1}{2}} {\Vert \mathbb {P}(\varvec{f}_S)\Vert _{L^2(\mathcal {T}_h)}},\\ \mu \Vert (\varvec{u}_h,\widehat{\varvec{u}}_h)\Vert _{1,h}&\le C {\Vert \mathbb {P}(\varvec{f}_S)\Vert _{L^2(\mathcal {T}_h)}}.\nonumber \end{aligned}$$
(4.12)

Since \(\textrm{tr}(\underline{\sigma }_h)\in L^2(\Omega )\), there exists a function \(\varvec{\theta }\in \varvec{H}^1(\Omega )\) and \(\varvec{\theta }\cdot \varvec{n}=0\) on \(\partial \Omega _D\) (cf. [34]) such that

$$\begin{aligned} \begin{aligned} \nabla \cdot \varvec{\theta } = \textrm{tr}(\underline{\sigma }_h),\quad \Vert \varvec{\theta }\Vert _{H^1(\Omega )} \le C \Vert \textrm{tr}(\underline{\sigma }_h)\Vert _{L^2(\Omega )}. \end{aligned} \end{aligned}$$
(4.13)

Then we have

$$\begin{aligned} \Vert \textrm{tr}(\underline{\sigma }_h) \Vert _{L^2(\mathcal {T}_h)}^2&=(\textrm{tr}(\underline{\sigma }_h), \nabla \cdot \varvec{\Pi _U}\varvec{\theta }) =(\textrm{tr}(\underline{\sigma }_h)\underline{I}, \varepsilon (\varvec{\Pi _U}\varvec{\theta }))_{\mathcal {T}_h}\nonumber \\&=d\Big ((\underline{\sigma }_h, \varepsilon (\varvec{\Pi _U}\varvec{\theta }))_{\mathcal {T}_h}-(\mathcal {A}\underline{\sigma }_h,\varepsilon (\varvec{\Pi _U}\varvec{\theta }))_{\mathcal {T}_h}\Big ). \end{aligned}$$
(4.14)

It follows from (2.11)

$$\begin{aligned} (\underline{\sigma }_h, \varepsilon (\varvec{\Pi _U}\varvec{\theta }))_{\mathcal {T}_h}&=\langle (\mathcal {A}\underline{\sigma }_h\varvec{n})^t, (\varvec{\Pi _U}\varvec{\theta })^t\rangle _{\partial \mathcal {T}_h}\\&\quad -\langle \tau (\varvec{P_M} \varvec{u}_h^t-\varvec{\widehat{u}}_h),(\varvec{\Pi _U}\varvec{\theta })^t\rangle _{\partial \mathcal {T}_h}+(\varvec{f}_S,\varvec{\Pi _U}\varvec{\theta })_{\mathcal {T}_h}, \end{aligned}$$

which combined with (2.12) implies

$$\begin{aligned} (\underline{\sigma }_h, \varepsilon (\varvec{\Pi _U}\varvec{\theta }))_{\mathcal {T}_h}&=\langle (\mathcal {A}\underline{\sigma }_h\varvec{n})^t, (\varvec{\Pi _U}\varvec{\theta })^t-\varvec{P_M}\varvec{\theta }^t\rangle _{\partial \mathcal {T}_h}-\langle \tau (\varvec{P_M} \varvec{u}_h^t\\&\quad -\varvec{\widehat{u}}_h),(\varvec{\Pi _U}\varvec{\theta })^t{-\varvec{P_M}\varvec{\theta }^t}\rangle _{\partial \mathcal {T}_h}+(\varvec{f}_S,\varvec{\Pi _U}\varvec{\theta })_{\mathcal {T}_h}\\ {}&\le C \Big (\Vert \mathcal {A}\underline{\sigma }_h\Vert _{L^2(\mathcal {T}_h)}+\mu ^{\frac{1}{2}}\Vert \tau ^{\frac{1}{2}} (\varvec{P_M} \varvec{u}_h^t-\varvec{\widehat{u}}_h)\Vert _{L^2(\partial \mathcal {T}_h)}\\&\quad +\Vert \varvec{f}_S\Vert _{L^2(\mathcal {T}_h)}\Big )\Vert \varvec{\theta }\Vert _{H^1(\Omega )}. \end{aligned}$$

As such, it holds

$$\begin{aligned} \Vert \textrm{tr}(\underline{\sigma }_h)\Vert _{L^2(\mathcal {T}_h)}\le C \Big (\Vert \mathcal {A}\underline{\sigma }_h\Vert _{L^2(\mathcal {T}_h)}+\mu ^{\frac{1}{2}}\left\| \tau ^{\frac{1}{2}} (\varvec{P_M} \varvec{u}_h^t-\varvec{\widehat{u}}_h)\right\| _{L^2(\partial \mathcal {T}_h)}+\Vert \varvec{f}_S\Vert _{L^2(\mathcal {T}_h)}\Big ). \end{aligned}$$

This and (4.12) yield

$$\begin{aligned} \Vert \underline{\sigma }_h\Vert _{L^2(\mathcal {T}_h)}\le C \Vert \varvec{f}_S\Vert _{L^2(\mathcal {T}_h)}. \end{aligned}$$

\(\square \)

In the following, we will carry out the convergence error estimates for the proposed scheme.

Lemma 4.2

Let \((\underline{\sigma },\varvec{u})\) be the solution of (2.1)–(2.2) and let \((\underline{\sigma }_h,\varvec{u}_h,\widehat{u}_h)\in \underline{\Sigma }_h\times \varvec{U}_h\times \varvec{\widehat{U}}_h\) be the discrete solution of (2.10)–(2.12). Assume that \((\varvec{u},\underline{\sigma })\in \varvec{H}^{s}(\Omega )\times \underline{H}^{s-1}(\Omega ),s>\frac{3}{2}\), then the following error equations hold

$$\begin{aligned}&((2\mu )^{-1}\mathcal {A}(\underline{\sigma }-\underline{\sigma }_h),\underline{w})_{ \mathcal {T}_h}-\langle (\varvec{\Pi _U} \varvec{u}- \varvec{u}_h)^n,(\underline{w}\varvec{n})^n\rangle _{\partial \mathcal {T}_h\backslash \partial \Omega _D}\nonumber \\&\quad +(\varvec{u}-\varvec{u}_h,\textrm{div}\underline{w})_{ \mathcal {T}_h}-\langle \varvec{e_{\widehat{u}}},(\underline{w}\varvec{n})^t\rangle _{\partial \mathcal {T}_h\backslash \partial \Omega _D}=0, \end{aligned}$$
(4.15)
$$\begin{aligned}&(\underline{\Pi _\Sigma }\underline{\sigma }-\underline{\sigma }_h, \varepsilon (\varvec{v}))_{ \mathcal {T}_h}-\langle ((\underline{\sigma }-\underline{\sigma }_h)\varvec{n})^t,\varvec{v}^t\rangle _{\partial \mathcal {T}_h}\nonumber \\&\quad -\langle \tau (\varvec{P_M} \varvec{u}_h^t-\varvec{\widehat{u}}_h),\varvec{v}^t\rangle _{\partial \mathcal {T}_h}=0, \end{aligned}$$
(4.16)
$$\begin{aligned}&\langle ((\underline{\sigma }-\underline{\sigma }_h)\varvec{n})^t,\varvec{\widehat{v}}\rangle _{\partial \mathcal {T}_h\backslash \partial \Omega _D} +\langle \tau (\varvec{P_M} \varvec{u}_h^t-\varvec{\widehat{u}}_h), \varvec{\widehat{v}}\rangle _{\partial \mathcal {T}_h}=0 \end{aligned}$$
(4.17)

for \((\underline{w},\varvec{v},\varvec{\widehat{v}})\in \underline{\Sigma }_h\times \varvec{U}_h\times \varvec{\widehat{U}}_h\).

Proof

First, we can infer from integration by parts and \(\varvec{u}=\varvec{0}\) on \(\partial \Omega _D\) that

$$\begin{aligned} ((2\mu )^{-1}\mathcal {A}\underline{\sigma },\underline{w})_{\mathcal {T}_h}&=( \varepsilon (\varvec{u}),\underline{w})_{\mathcal {T}_h}=\langle \varvec{u}^n,(\underline{w}\varvec{n})^n\rangle _{\partial \mathcal {T}_h}+\langle \varvec{u}^t,(\underline{w}\varvec{n})^t\rangle _{\partial \mathcal {T}_h}-(\varvec{u},\textrm{div}\underline{w})_{\mathcal {T}_h}\\&=\langle \varvec{\Pi _U}\varvec{u}^n,(\underline{w}\varvec{n})^n\rangle _{\partial \mathcal {T}_h}+\langle \varvec{P_M}\varvec{u}^t,(\underline{w}\varvec{n})^t\rangle _{\partial \mathcal {T}_h\backslash \partial \Omega _D}-(\varvec{u},\textrm{div}\underline{w})_{\mathcal {T}_h}. \end{aligned}$$

Subtracting this equation from (2.10) implies (4.15).

Then using integration by parts and \(\llbracket \varvec{v}^n\rrbracket _{|F}=\varvec{0}\) for \(F\in \mathcal {F}_h^0\) and \(\varvec{v}\cdot \varvec{n}=0\) on \(\partial \Omega _D\), we can obtain

$$\begin{aligned} (\textrm{div}\underline{\sigma },\varvec{v})_{\mathcal {T}_h}&=\langle \underline{\sigma }\varvec{n},\varvec{v}\rangle _{\partial \mathcal {T}_h}-(\underline{\sigma },\varepsilon (\varvec{v}))_{\mathcal {T}_h}\\&=\langle (\underline{\sigma }\varvec{n})^t,\varvec{v}^t\rangle _{\partial \mathcal {T}_h}-(\underline{\sigma },\varepsilon (\varvec{v}))_{\mathcal {T}_h}=-(\varvec{f}_S,\varvec{v})_{\mathcal {T}_h}. \end{aligned}$$

where we also use \(\underline{\sigma }\varvec{n}=\varvec{0}\) on \(\partial \Omega _N\).

Then the \(L^2\)-projection property of \(\underline{\Pi _\Sigma }\) leads to

$$\begin{aligned} \langle (\underline{\sigma }\varvec{n})^t,\varvec{v}^t\rangle _{\partial \mathcal {T}_h}-(\underline{\Pi _\Sigma }\underline{\sigma },\varepsilon (\varvec{v}))_{\mathcal {T}_h}=-(\varvec{f}_S,\varvec{v})_{\mathcal {T}_h}, \end{aligned}$$

which combined with (2.11) leads to (4.16).

Finally, the continuity of \(\underline{\sigma }\) and the boundary condition \(\underline{\sigma }\varvec{n}=\varvec{0}\) on \(\partial \Omega _N\) implies \( \langle (\underline{\sigma }\varvec{n})^t,\varvec{\widehat{v}}\rangle _{\partial \mathcal {T}_h\backslash \partial \Omega _D}=0, \) then subtracting this from (2.12) yields (4.17). Thus, the proof is completed. \(\square \)

Lemma 4.3

Let \((\underline{\sigma },\varvec{u})\) be the solution of (2.1)–(2.2) and let \((\underline{\sigma }_h,\varvec{u}_h,\varvec{\widehat{u}}_h)\in \underline{\Sigma }_h\times \varvec{U}_h\times \varvec{\widehat{U}}_h\) be the discrete solution of (2.10)–(2.12). Assume that \((\varvec{u},\underline{\sigma })\in \varvec{H}^{s}(\Omega )\times \underline{H}^{s-1}(\Omega ),s>\frac{3}{2}\), then the following holds

$$\begin{aligned}&\left\| (2\mu )^{-\frac{1}{2}}\mathcal {A}\underline{e_\sigma }\right\| _{L^2(\mathcal {T}_h)}+\left\| \tau ^{\frac{1}{2}}( \varvec{P_M} \varvec{e_u}^t-\varvec{e_{\widehat{u}}})\right\| _{L^2(\partial \mathcal {T}_h)}\\&\quad \le C \Bigg (\left\| (2\mu )^{-\frac{1}{2}}(\mathcal {A}\underline{\sigma }-\underline{\Pi _\Sigma }(\mathcal {A} \underline{\sigma }))\right\| _{L^2(\mathcal {T}_h)}+\left\| \tau ^{-\frac{1}{2}}((\mathcal {A}\underline{\sigma }-\underline{\Pi _\Sigma }\mathcal {A}\underline{\sigma })\varvec{n})^t\right\| _{L^2(\partial \mathcal {T}_h)}\\&\qquad +\mu ^{\frac{1}{2}}\Vert \varepsilon (\varvec{e_u})\Vert _{L^2(\mathcal {T}_h)}+\left\| \tau ^{\frac{1}{2}}((\varvec{\Pi _U} \varvec{u})^t-\varvec{u}^t)\right\| _{L^2(\partial \mathcal {T}_h)}\Bigg ). \end{aligned}$$

Proof

We have from (4.1)

$$\begin{aligned} ((2\mu )^{-1}\mathcal {A} (\underline{\sigma }-\underline{\Pi _\Sigma }\underline{\sigma }), \underline{w})_{\mathcal {T}_h}=((2\mu )^{-1}(\mathcal {A}\underline{\sigma }-\underline{\Pi _\Sigma } (\mathcal {A}\underline{\sigma })), \underline{w})_{\mathcal {T}_h}\quad \forall \underline{w}\in \underline{\Sigma }_h. \end{aligned}$$

Taking \(\underline{w}=\underline{e_\sigma }\), \(\varvec{v}=\varvec{e_u}\) and \(\varvec{\widehat{v}}=\varvec{e_{\widehat{u}}}\) in (4.15)–(4.17), then summing up the resulting equations yields

$$\begin{aligned} \begin{aligned}&((2\mu )^{-1}\mathcal {A} (\underline{\sigma }-\underline{\sigma }_h), \underline{e_\sigma })_{\mathcal {T}_h}+(\varvec{u}-\varvec{\Pi _U}\varvec{u}, \textrm{div}(\underline{e_\sigma }))_{ \mathcal {T}_h}\\&\quad -\langle ((\underline{\sigma }-\underline{\Pi _\Sigma }\underline{\sigma })\varvec{n})^t, \varvec{e_u}^t-\varvec{e_{\widehat{u}}}\rangle _{\partial \mathcal {T}_h}\\&\quad -\langle \tau (\varvec{P_M} \varvec{u}_h^t-\varvec{\widehat{u}}_h),\varvec{e_u}^t-\varvec{e_{\widehat{u}}}\rangle _{\partial \mathcal {T}_h}=0. \end{aligned} \end{aligned}$$
(4.18)

Proceeding analogously to (4.10), we have

$$\begin{aligned} \Vert \tau ^{\frac{1}{2}}(\varvec{e_u}^t-\varvec{e_{\widehat{u}}})\Vert _{L^2(\partial \mathcal {T}_h)}\le C \Big (\mu ^{\frac{1}{2}}\Vert \varepsilon (\varvec{e_u})\Vert _{L^2(\mathcal {T}_h)}+\Vert \tau ^{\frac{1}{2}}(\varvec{P_M}\varvec{e_u}^t-\varvec{e_{\widehat{u}}})\Vert _{L^2(\partial \mathcal {T}_h)}\Big ). \end{aligned}$$
(4.19)

Thus, it follows from (4.1) and the fact that \((\mathcal {A} \underline{\sigma }\varvec{n})^t_{|F}= (\underline{\sigma }\varvec{n})^t_{|F}\) for \(F\in \mathcal {F}_h\)

$$\begin{aligned}&\langle ((\underline{\sigma }-\underline{\Pi _\Sigma }\underline{\sigma })\varvec{n})^t, \varvec{e_u}^t-\varvec{e_{\widehat{u}}}\rangle _{\partial \mathcal {T}_h}\\&\quad = \langle ((\mathcal {A}\underline{\sigma }-\underline{\Pi _\Sigma }\mathcal {A}\underline{\sigma })\varvec{n})^t, \varvec{e_u}^t-\varvec{e_{\widehat{u}}}\rangle _{\partial \mathcal {T}_h}\\&\quad \le C\left\| \tau ^{-\frac{1}{2}}((\mathcal {A}\underline{\sigma }-\underline{\Pi _\Sigma }\mathcal {A}\underline{\sigma })\varvec{n})^t\right\| _{L^2(\partial \mathcal {T}_h)}\left\| \tau ^{\frac{1}{2}}(\varvec{e_u}^t-\varvec{e_{\widehat{u}}})\right\| _{L^2(\partial \mathcal {T}_h)}\\&\quad \le C \left\| \tau ^{-\frac{1}{2}}((\mathcal {A}\underline{\sigma }-\underline{\Pi _\Sigma }\mathcal {A}\underline{\sigma })\varvec{n})^t\right\| _{L^2(\partial \mathcal {T}_h)}\Big (\Vert \varepsilon (\varvec{e_u})\Vert _{L^2(\mathcal {T}_h)}\\&\qquad +\Vert h^{-1/2}(\varvec{P_M}\varvec{e_u}^t-\varvec{e_{\widehat{u}}})\Vert _{L^2(\partial \mathcal {T}_h)}\Big ). \end{aligned}$$

The last term on the left-hand side of (4.18) can be rewritten as

$$\begin{aligned} -\tau (\varvec{P_M}\varvec{u}_h^t-\varvec{\widehat{u}}_h)&=\tau (\varvec{P_M} ((\varvec{\Pi _U} \varvec{u})^t-\varvec{u}_h^t)-(\varvec{P_M}\varvec{u}^t-\varvec{\widehat{u}}_h))-\tau (\varvec{P_M}((\varvec{\Pi _U} \varvec{u})^t-\varvec{u}^t))\\&=\tau (\varvec{P_M}\varvec{e_u}^t-\varvec{e_{\widehat{u}}})-\tau (\varvec{P_M}((\varvec{\Pi _U} \varvec{u})^t-\varvec{u}^t)). \end{aligned}$$

Then the Cauchy–Schwarz inequality yields

$$\begin{aligned}&-\langle \tau (\varvec{P_M} \varvec{u}_h^t-\varvec{\widehat{u}}_h),\varvec{e_u}^t-\varvec{e_{\widehat{u}}}\rangle _{\partial \mathcal {T}_h}\\&\quad =\langle \tau (\varvec{P_M}\varvec{e_u}^t-\varvec{e_{\widehat{u}}})-\tau (\varvec{P_M}((\varvec{\Pi _U} \varvec{u})^t-\varvec{u}^t)),\varvec{e_u}^t-\varvec{e_{\widehat{u}}} \rangle _{\partial \mathcal {T}_h}\\&\quad =\langle \tau (\varvec{P_M} \varvec{e_u}^t-\varvec{e_{\widehat{u}}}),\varvec{P_M}\varvec{e_u}^t-\varvec{e_{\widehat{u}}}\rangle _{\partial \mathcal {T}_h}\\&\qquad \;- \langle \tau (\varvec{P_M}((\varvec{\Pi _U} \varvec{u})^t-\varvec{u}^t)),\varvec{P_M}\varvec{e_u}^t-\varvec{e_{\widehat{u}}} \rangle _{\partial \mathcal {T}_h}\\&\quad =\left\| \tau ^{\frac{1}{2}} (\varvec{P_M} \varvec{e_u}^t-\varvec{e_{\widehat{u}}})\right\| _{L^2(\partial \mathcal {T}_h)}^2- \langle \tau (\varvec{P_M}((\varvec{\Pi _U} \varvec{u})^t-\varvec{u}^t)),\varvec{P_M}\varvec{e_u}^t-\varvec{e_{\widehat{u}}} \rangle _{\partial \mathcal {T}_h}. \end{aligned}$$

The Cauchy–Schwarz inequality gives

$$\begin{aligned}&- \langle \tau (\varvec{P_M}((\varvec{\Pi _U} \varvec{u})^t-\varvec{u}^t)),\varvec{P_M}\varvec{e_u}^t-\varvec{e_{\widehat{u}}} \rangle _{\partial \mathcal {T}_h}\\&\quad \le \left\| \tau ^{\frac{1}{2}} (\varvec{P_M} \varvec{e_u}^t-\varvec{e_{\widehat{u}}})\right\| _{L^2(\partial \mathcal {T}_h)} \left\| \tau ^{\frac{1}{2}}((\varvec{\Pi _U} \varvec{u})^t-\varvec{u}^t)\right\| _{L^2(\partial \mathcal {T}_h)}. \end{aligned}$$

The proof is completed by combining the above estimates and Young’s inequality. \(\square \)

Lemma 4.4

Under the assumption of Lemma 4.3, the following error estimates hold

$$\begin{aligned} \Vert \varepsilon (\varvec{e_u})\Vert _{L^2(\mathcal {T}_h)}&\le C \Big (\Vert \mu ^{-1}\mathcal {A}\underline{e_\sigma }\Vert _{L^2(\mathcal {T}_h)}+\Vert (2\mu )^{-1}(\underline{\Pi _\Sigma } (\mathcal {A}\underline{\sigma })-\mathcal {A}\underline{\sigma })\Vert _{L^2(\mathcal {T}_h)}\nonumber \\&\quad +\Vert \varepsilon (\varvec{u}-\varvec{\Pi _U}\varvec{u})\Vert _{L^2(\mathcal {T}_h)}+\mu ^{-\frac{1}{2}}\left\| \tau ^{\frac{1}{2}}(\varvec{P_M}\varvec{e_u}^t-\varvec{e_{\widehat{u}}})\right\| _{L^2(\partial \mathcal {T}_h)} \Big )\nonumber \\ \end{aligned}$$
(4.20)

and

$$\begin{aligned} \Vert (\varvec{e_u},\varvec{\widehat{e}_u})\Vert _{1,h}&\le C \Big (\Vert \varepsilon (\varvec{e_u})\Vert _{L^2(\mathcal {T}_h)}+\mu ^{-\frac{1}{2}}\left\| \tau ^{{\frac{1}{2}}}(\varvec{P_M}\varvec{e_u}^t-\varvec{e_{\widehat{u}}})\right\| _{L^2(\partial \mathcal {T}_h)}\Big ). \end{aligned}$$

Proof

Taking \(\underline{w}=\varepsilon _h(\varvec{e_u})\) in (4.15) and applying integration by parts imply

$$\begin{aligned}&((2\mu )^{-1}\mathcal {A} \underline{e_\sigma }, \varepsilon (\varvec{e_u}))_{ \mathcal {T}_h}-(\varepsilon (\varvec{u}-\varvec{u}_h), \varepsilon (\varvec{e_u}))_{ \mathcal {T}_h}\\&\quad \;+\langle (\varvec{u}-\varvec{u}_h)^t-\varvec{e_{\widehat{u}}}, (\varepsilon (\varvec{e_u})\varvec{n})^t\rangle _{\partial \mathcal {T}_h}=((2\mu )^{-1}\mathcal {A}(\underline{\Pi _\Sigma } \underline{\sigma }-\underline{\sigma }), \varepsilon (\varvec{e_u}))_{\mathcal {T}_h}. \end{aligned}$$

Therefore, it holds

$$\begin{aligned} \begin{aligned} \Vert \varepsilon (\varvec{e_u})\Vert _{L^2(\mathcal {T}_h)}^2&=((2\mu )^{-1}\mathcal {A} \underline{e_\sigma }, \varepsilon (\varvec{e_u}))_{ \mathcal {T}_h}-(\varepsilon (\varvec{u}-\varvec{\Pi _U}\varvec{u}), \varepsilon (\varvec{e_u}))_{ \mathcal {T}_h}\\&\quad +\langle (\varvec{u}-\varvec{u}_h)^t-\varvec{e_{\widehat{u}}}, (\varepsilon (\varvec{e_u})\varvec{n})^t\rangle _{\partial \mathcal {T}_h}\\&\quad \;-((2\mu )^{-1}(\underline{\Pi _\Sigma } (\mathcal {A}\underline{\sigma })-\mathcal {A}\underline{\sigma }), \varepsilon (\varvec{e_u}))_{\mathcal {T}_h}. \end{aligned} \end{aligned}$$
(4.21)

The Cauchy–Schwarz inequality and the error estimates (4.4)–(4.5) imply

$$\begin{aligned} ((2\mu )^{-1}\mathcal {A} \underline{e_\sigma }, \varepsilon (\varvec{e_u}))_{\mathcal {T}_h}&\le \frac{1}{2} \Vert \mu ^{-1}\mathcal {A} \underline{e_\sigma }\Vert _{L^2(\mathcal {T}_h)} \Vert \varepsilon (\varvec{e_u})\Vert _{L^2(\mathcal {T}_h)},\\ (\varepsilon (\varvec{u}-\varvec{\Pi _U}\varvec{u}), \varepsilon (\varvec{e_u}))_{ \mathcal {T}_h}&\le C \Vert \varepsilon (\varvec{u}-\varvec{\Pi _U}\varvec{u})\Vert _{L^2(\mathcal {T}_h)}\Vert \varepsilon (\varvec{e_u})\Vert _{L^2(\mathcal {T}_h)},\\ ((2\mu )^{-1}(\underline{\Pi _\Sigma } (\mathcal {A}\underline{\sigma })-\mathcal {A}\underline{\sigma }), \varepsilon (\varvec{e_u}))_{\mathcal {T}_h}&\le C \Vert (2\mu )^{-1}(\underline{\Pi _\Sigma } (\mathcal {A}\underline{\sigma })-\mathcal {A}\underline{\sigma })\Vert _{L^2(\mathcal {T}_h)}\Vert \varepsilon (\varvec{e_u})\Vert _{L^2(\mathcal {T}_h)}. \end{aligned}$$

For the third term on the right-hand side of (4.21), we have

$$\begin{aligned}&\langle ( \varvec{u}-\varvec{u}_h)^t-\varvec{e_{\widehat{u}}}, (\varepsilon (\varvec{e_u})\varvec{n})^t\rangle _{\partial \mathcal {T}_h}\\&\quad =\langle \varvec{u}^t-(\varvec{\Pi _U}\varvec{u})^t ,(\varepsilon (\varvec{e_u})\varvec{n})^t\rangle _{\partial \mathcal {T}_h}+\langle \varvec{e_u}^t-\varvec{e_{\widehat{u}}}, (\varepsilon (\varvec{e_u})\varvec{n})^t\rangle _{\partial \mathcal {T}_h}. \end{aligned}$$

The first term on the right-hand side can be bounded by the Cauchy–Schwarz inequality and (4.7)

$$\begin{aligned} \langle \varvec{u}^t-(\varvec{\Pi _U}\varvec{u})^t ,(\varepsilon (\varvec{e_u})\varvec{n})^t\rangle _{\partial \mathcal {T}_h}\le C\sum _{K\in \mathcal {T}_h}h_K^{-\frac{1}{2}}\Vert \varvec{u}-\varvec{\Pi _U}\varvec{u}\Vert _{L^2(\partial K)}\Vert \varepsilon (\varvec{e_u})\Vert _{L^2(K)}. \end{aligned}$$

The second term on the right-hand side can be estimated by using the Cauchy–Schwarz inequality and the trace inequality (4.7)

$$\begin{aligned} \langle \varvec{P_M}\varvec{e_u}^t-\varvec{e_{\widehat{u}}}, (\varepsilon (\varvec{e_u})\varvec{n})^t\rangle _{\partial \mathcal {T}_h}&\le C\left\| \tau ^{\frac{1}{2}}(\varvec{P_M}\varvec{e_u}^t-\varvec{e_{\widehat{u}}})\right\| _{L^2(\partial \mathcal {T}_h)} \left\| \tau ^{-\frac{1}{2}}\varepsilon (\varvec{e_u})\right\| _{L^2(\partial \mathcal {T}_h)}\\&\le C\mu ^{-\frac{1}{2}} \left\| \tau ^{\frac{1}{2}}(\varvec{P_M}\varvec{e_u}^t-\varvec{e_{\widehat{u}}})\right\| _{L^2(\partial \mathcal {T}_h)} \Vert \varepsilon (\varvec{e_u})\Vert _{L^2( \mathcal {T}_h)}. \end{aligned}$$

The proof of (4.20) is completed by combining the above estimates and Young’s inequality.

Proceeding similarily to (4.11), it holds

$$\begin{aligned} \Vert (\varvec{e_u},\varvec{e_{\widehat{u}}})\Vert _{1,h}^2\le C \Big (\Vert \varepsilon (\varvec{u}_h)\Vert _{L^2(\mathcal {T}_h)}+\mu ^{-\frac{1}{2}}\Vert \tau ^{{\frac{1}{2}}}(\varvec{P_M}\varvec{u}_h^t-\varvec{\widehat{u}}_h)\Vert _{L^2(\partial \mathcal {T}_h)}\Big ), \end{aligned}$$

which completes the proof. \(\square \)

Combining Lemmas 4.34.4, the trace inequality (4.6) and the error estimates (4.2)–(4.5) implies Theorem 2.3.

We can observe from (2.16) that the following superconvergence property holds

$$\begin{aligned} \Vert h^{\frac{1}{2}} (\varvec{P_M}\varvec{e_u}^t-\varvec{e_{\widehat{u}}})\Vert _{L^2(\partial \mathcal {T}_h)}\le C h^{k+2} |\varvec{u}|_{H^s(\Omega )}. \end{aligned}$$

This superconvergence property is crucial to achieve the optimal convergence rates for \(L^2\)-errors of stress and velocity, as we are going to see in the following.

Proof of Theorem 2.4 (\(L^2\)-error for stress)

The proof is similar to that of the stability estimate for \(\Vert \underline{\sigma }_h\Vert _{L^2(\mathcal {T}_h)}\) given in Theorem 2.2. We provide the proof here for completeness. The definition of \(\mathcal {A}\) implies \( \underline{e_\sigma }=\mathcal {A} \underline{e_\sigma }+\frac{1}{d} \textrm{tr}(\underline{e_\sigma }) \underline{I}. \) The upper bound for \(\mathcal {A}\underline{e_\sigma }\) is given in Theorem 2.3. Thus, it suffices to show the error estimate for the second term. Since \(\textrm{tr}(\underline{e_\sigma })\in L^2(\Omega )\), there exists a function \(\varvec{\theta }\in \varvec{H}^1(\Omega )\) and \(\varvec{\theta }\cdot \varvec{n}=0\) on \(\partial \Omega _D\) (cf. [34]) such that

$$\begin{aligned} \begin{aligned} \nabla \cdot \varvec{\theta }&= \textrm{tr}(\underline{e_\sigma }),\quad \Vert \varvec{\theta }\Vert _{H^1(\Omega )} \le C \Vert \textrm{tr}(\underline{e_\sigma })\Vert _{L^2(\Omega )}. \end{aligned} \end{aligned}$$
(4.22)

Therefore, we have

$$\begin{aligned}{} & {} \Vert \textrm{tr}(\underline{e_\sigma })\Vert _{L^2(\mathcal {T}_h)}^2=(\textrm{tr}(\underline{e_\sigma }), \nabla \cdot \varvec{\theta })_{ \mathcal {T}_h}=(\textrm{tr}(\underline{e_\sigma }), \nabla \cdot (\varvec{\Pi _U}\varvec{\theta }))_{\mathcal {T}_h}\\{} & {} \quad =d(\underline{e_\sigma }, \varepsilon (\varvec{\Pi _U}\varvec{\theta }))_{ \mathcal {T}_h}-d(\mathcal {A} \underline{e_\sigma }, \varepsilon (\varvec{\Pi _U}\theta ))_{ \mathcal {T}_h}, \end{aligned}$$

we can estimate the second term on the right-hand side via the Cauchy–Schwarz inequality and Theorem 2.3. It remains to show the upper bound for the first term on the right-hand side.

We can deduce from (4.16) and (4.17)

$$\begin{aligned} (\underline{e}_\sigma , \varepsilon (\varvec{\Pi _U}\varvec{\theta }))_{ \mathcal {T}_h}&=\langle (\underline{\sigma }\varvec{n})^t-(\underline{\sigma }_h\varvec{n})^t, (\varvec{\Pi _U}\varvec{\theta })^t\rangle _{\partial \mathcal {T}_h} +\langle \tau (\varvec{P_M}\varvec{u}_h^t-\varvec{\widehat{u}}_h),(\varvec{\Pi _U}\varvec{\theta })^t \rangle _{\partial \mathcal {T}_h}\\&=\langle (\underline{\sigma }\varvec{n})^t-(\underline{\sigma }_h\varvec{n})^t, (\varvec{\Pi _U}\varvec{\theta })^t-\varvec{P_M}\varvec{\theta }^t\rangle _{\partial \mathcal {T}_h}\\ {}&\quad +\langle \tau (\varvec{P_M}\varvec{u}_h^t-\varvec{\widehat{u}}_h),(\varvec{\Pi _U}\varvec{\theta })^t-\varvec{P_M}\varvec{\theta }^t \rangle _{\partial \mathcal {T}_h}. \end{aligned}$$

An application of (4.1), the Cauchy–Schwarz inequality, the trace inequality (4.6) and (4.22) implies

$$\begin{aligned}&\langle ((\underline{\sigma }-\underline{\sigma }_h)\varvec{n})^t, (\varvec{\Pi _U}\varvec{\theta })^t-\varvec{P_M}\varvec{\theta }^t\rangle _{\partial \mathcal {T}_h}\\&\quad = \langle (\mathcal {A}(\underline{\sigma }-\underline{\sigma }_h)\varvec{n})^t, (\varvec{\Pi _U}\varvec{\theta })^t-\varvec{P_M}\varvec{\theta }^t\rangle _{\partial \mathcal {T}_h}\\&\quad =\langle ((\mathcal {A}\underline{\sigma }-\underline{\Pi _\Sigma }\mathcal {A}\underline{\sigma })\varvec{n})^t, (\varvec{\Pi _U}\varvec{\theta })^t-\varvec{P_M}\varvec{\theta }^t\rangle _{\partial \mathcal {T}_h}\\&\qquad +\langle (\mathcal {A}(\underline{\Pi _\Sigma }\underline{\sigma }-\underline{\sigma }_h)\varvec{n})^t, (\varvec{\Pi _U}\varvec{\theta })^t-\varvec{P_M}\varvec{\theta }^t\rangle _{\partial \mathcal {T}_h}\\&\quad \le C\Big ( h^t|\mathcal {A}\underline{\sigma }|_{H^t(\Omega )}+\Vert \mathcal {A}(\underline{\Pi _\Sigma }\underline{\sigma }-\underline{\sigma }_h)\Vert _{L^2(\Omega )}\Big )\Vert \varvec{\theta }\Vert _{H^1(\Omega )}\\&\quad \le C \Big ( h^t|\mathcal {A}\underline{\sigma }|_{H^t(\Omega )}+\Vert \mathcal {A}(\underline{\Pi _\Sigma }\underline{\sigma }-\underline{\sigma }_h)\Vert _{L^2(\Omega )}\Big )\Vert \textrm{tr}(\underline{e_\sigma })\Vert _{L^2(\Omega )}. \end{aligned}$$

Similarily, we have

$$\begin{aligned}&\langle \tau (\varvec{P_M}\varvec{u}_h^t-\varvec{\widehat{u}}_h),(\varvec{\Pi _U}\varvec{\theta })^t-\varvec{P_M}\varvec{\theta }^t\rangle _{\partial \mathcal {T}_h}\\&\quad \le C \mu ^{\frac{1}{2}}\Vert \tau ^{\frac{1}{2}}(\varvec{P_M}\varvec{u}_h^t-\varvec{\widehat{u}}_h)\Vert _{L^2(\partial \mathcal {T}_h)}\Vert \varvec{\theta }\Vert _{H^1(\Omega )}. \end{aligned}$$

The assertion follows by combining the above estimates and Theorem 2.3. \(\square \)

Now we are ready to prove the \(L^2\)-error of velocity. For any \(\varvec{g}\in \varvec{L}^2(\Omega )\), we assume that the dual problem

$$\begin{aligned} (2\mu )^{-1} \mathcal {A}\underline{\psi }&= \varepsilon (\varvec{\phi })\quad \text{ in }\;\Omega , \end{aligned}$$
(4.23)
$$\begin{aligned} \textrm{div}\underline{\psi }&=\varvec{g} \quad \text{ in }\;\Omega ,\nonumber \\ \varvec{\phi }&=\varvec{0}\quad \text{ on }\;\partial \Omega _D,\nonumber \\ \underline{\psi }\varvec{n}&=\varvec{0}\quad \text{ on }\;\partial \Omega _N \end{aligned}$$
(4.24)

satisfies the elliptic regularity estimate

$$\begin{aligned} \Vert \underline{\psi }\Vert _{H^1(\Omega )}+\Vert \mu \varvec{\phi }\Vert _{H^2(\Omega )}\le C \Vert \varvec{g}\Vert _{L^2(\Omega )}. \end{aligned}$$

This estimate holds, for instance, if the domain \(\Omega \) is convex (cf. [33]).

Proof of Theorem 2.5 (\(L^2\)-error for velocity)

First we set \(\varvec{g}=\varvec{u}-\varvec{u}_h\) in (4.23)–(4.24). Then we multiply (4.23) by \(\underline{e_\sigma }\) and (4.24) by \(\varvec{u}-\varvec{u}_h\), then we can obtain

$$\begin{aligned} \Vert \varvec{u}-\varvec{u}_h\Vert _{L^2(\mathcal {T}_h)}^2&=(\textrm{div}\underline{\psi }, \varvec{u}-\varvec{u}_h)_{\mathcal {T}_h}+((2\mu )^{-1}\mathcal {A}\underline{\psi }, \underline{e}_\sigma )_{\mathcal {T}_h}-(\underline{e_\sigma },\varepsilon (\varvec{\phi }))_{\mathcal {T}_h}\\&=(\textrm{div}(\underline{\psi }-\underline{\Pi _\Sigma } \underline{\psi }), \varvec{u}-\varvec{u}_h )_{\mathcal {T}_h}+(\textrm{div}\underline{\Pi _\Sigma }\underline{\psi }, \varvec{u}-\varvec{u}_h )_{\mathcal {T}_h}\\&\quad +((2\mu )^{-1}\mathcal {A} \Pi _\Sigma \underline{\psi }, \underline{e_\sigma })_{\mathcal {T}_h}\\&\quad \;+((2\mu )^{-1}\mathcal {A}(\underline{\psi }-\underline{\Pi _\Sigma } \underline{\psi }), \underline{e_\sigma })_{\mathcal {T}_h}-(\underline{e_\sigma },\varepsilon (\varvec{\Pi _U}\varvec{\phi }))_{\mathcal {T}_h}\\&\quad \;-(\underline{e_\sigma },\varepsilon (\varvec{\phi }-\varvec{\Pi _U}\varvec{\phi }))_{\mathcal {T}_h}. \end{aligned}$$

We can infer from (4.15)–(4.16) by taking \(\underline{w}= \underline{\Pi _\Sigma }\underline{\psi }\) and \(\varvec{v}=\varvec{\Pi _{U}}\varvec{\phi }\)

$$\begin{aligned}&(\varvec{u}-\varvec{u}_h, \textrm{div}\underline{\Pi _\Sigma } \underline{\psi })_{\mathcal {T}_h}+((2\mu )^{-1}\mathcal {A} \underline{\Pi _\Sigma }\underline{\psi }, \underline{e_\sigma })_{\mathcal {T}_h}\\&\quad =\langle \varvec{e_u}^n, (\underline{\Pi _\Sigma }\underline{\psi })^n\rangle _{\partial \mathcal {T}_h}+\langle \varvec{e_{\widehat{u}}},(\underline{\Pi _\Sigma }\underline{\psi }\varvec{n})^t\rangle _{\partial \mathcal {T}_h},\\&(\underline{e_\sigma },\varepsilon (\varvec{\Pi _U}\varvec{\phi }))_{\mathcal {T}_h}=\langle (\underline{\sigma }\varvec{n})^t-(\underline{\sigma }_h\varvec{n})^t,(\varvec{\Pi _U}\varvec{\phi })^t\rangle _{\partial \mathcal {T}_h\backslash \partial \Omega _D}\\&\qquad +\langle \tau (\varvec{P_M} \varvec{u}_h^t-\varvec{\widehat{u}}_h),(\varvec{\Pi _U}\varvec{\phi })^t \rangle _{\partial \mathcal {T}_h}. \end{aligned}$$

By the elliptic regularity assumption \((\underline{\psi },\varvec{\phi })\in \underline{H}^1(\Omega )\times \varvec{H}^2(\Omega )\), thereby \(\varvec{\phi }\) and the normal component of \(\underline{\psi }\) are continuous. Then it holds

$$\begin{aligned} \langle (\underline{\sigma }\varvec{n})^t-(\underline{\sigma }_h\varvec{n})^t,\varvec{\phi }^t\rangle _{\partial \mathcal {T}_h\backslash \partial \Omega _D}+\langle \tau (\varvec{P_M}\varvec{u}_h^t-\varvec{\widehat{u}}_h),\varvec{\phi }^t\rangle _{\partial \mathcal {T}_h}&=0,\quad \langle \varvec{e_{\widehat{u}}},(\underline{\psi }\varvec{n})^t\rangle _{\partial \mathcal {T}_h\backslash \partial \Omega _D}=0. \end{aligned}$$

Using integration by parts and the \(L^2\)-orthogonal property of \(\underline{\Pi _\Sigma }\), we can obtain

$$\begin{aligned}&(\textrm{div}(\underline{\psi }-\underline{\Pi _\Sigma } \underline{\psi }), \varvec{u}-\varvec{u}_h )_{\mathcal {T}_h}\\&\quad =\langle (\underline{\psi }-\underline{\Pi _\Sigma } \underline{\psi })\varvec{n}, \varvec{u}-\varvec{u}_h\rangle _{\partial \mathcal {T}_h}-(\underline{\psi }-\underline{\Pi _\Sigma } \underline{\psi }, \varepsilon (\varvec{u}-\varvec{u}_h))_{\mathcal {T}_h}\\&\quad =\langle (\underline{\psi }-\underline{\Pi _\Sigma } \underline{\psi })\varvec{n}, \varvec{u}-\varvec{\Pi _{U}}\varvec{u}\rangle _{\partial \mathcal {T}_h}+\langle (\underline{\psi }-\underline{\Pi _\Sigma } \underline{\psi })\varvec{n}, \varvec{e_u}\rangle _{\partial \mathcal {T}_h}\\&\qquad -(\underline{\psi }-\underline{\Pi _\Sigma } \underline{\psi }, \varepsilon (\varvec{u}-\varvec{\Pi _U}\varvec{u}))_{\mathcal {T}_h}\\&\quad ={\langle (\underline{\psi }-\underline{\Pi _\Sigma } \underline{\psi })\varvec{n}, \varvec{u}-\varvec{\Pi _{U}}\varvec{u}\rangle _{\partial \mathcal {T}_h}-\langle (\underline{\Pi _\Sigma } \underline{\psi }\varvec{n})^n, \varvec{e_u}^n\rangle _{\partial \mathcal {T}_h}}\\&\qquad {+\langle ((\underline{\psi }-\underline{\Pi _\Sigma } \underline{\psi })\varvec{n})^t, \varvec{e_u}^t\rangle _{\partial \mathcal {T}_h}-(\underline{\psi }-\underline{\Pi _\Sigma } \underline{\psi }, \varepsilon (\varvec{u}-\varvec{\Pi _U}\varvec{u}))_{\mathcal {T}_h}.} \end{aligned}$$

Then we have

$$\begin{aligned}&\Vert \varvec{u}-\varvec{u}_h\Vert _{L^2(\mathcal {T}_h)}^2\nonumber \\&\quad =\langle \varvec{e_u}^t- \varvec{e_{\widehat{u}}},((\underline{\psi }-\underline{\Pi _\Sigma }\underline{\psi })\varvec{n})^t\rangle _{\partial \mathcal {T}_h}-\langle (\underline{\sigma }\varvec{n})^t-(\underline{\sigma }_h\varvec{n})^t,(\varvec{\Pi _U}\varvec{\phi })^t-\varvec{\phi }^t\rangle _{\partial \mathcal {T}_h\backslash \partial \Omega _D}\nonumber \\&\qquad +((2\mu )^{-1}\mathcal {A}(\underline{\psi }-\underline{\Pi _\Sigma } \underline{\psi }), \underline{e_\sigma })_{\mathcal {T}_h}-(\mathcal {A}\underline{e}_\sigma ,\varepsilon (\varvec{\phi }-\varvec{\Pi _U}\varvec{\phi }))_{\mathcal {T}_h}\nonumber \\&\qquad - \langle \tau (\varvec{P_M} \varvec{u}_h^t-\varvec{\widehat{u}}_h),(\varvec{\Pi _U}\varvec{\phi }-\varvec{\phi })^t\rangle _{\partial \mathcal {T}_h} -(\underline{\psi }-\underline{\Pi _\Sigma } \underline{\psi }, \varepsilon (\varvec{u}-\varvec{\Pi _U}\varvec{u}))_{\mathcal {T}_h}\nonumber \\&\qquad +\langle (\underline{\psi }-\underline{\Pi _\Sigma } \underline{\psi })\varvec{n}, \varvec{u}-\varvec{\Pi _{U}}\varvec{u}\rangle _{\partial \mathcal {T}_h}, \end{aligned}$$
(4.25)

where we use \((\textrm{tr}(\underline{e_\sigma })\underline{I}, \varepsilon (\varvec{\phi }-\varvec{\Pi _U}\varvec{\phi }))_{\mathcal {T}_h}=(\textrm{tr}(\underline{e_\sigma }), \nabla \cdot (\varvec{\phi }-\varvec{\Pi _U}\varvec{\phi }))_{\mathcal {T}_h}=0\) for the fourth term on the right-hand side.

Now we estimate each term on the right-hand side of (4.25) separately. First, we have from the Cauchy–Schwarz inequality and (4.19)

$$\begin{aligned}&\langle \varvec{e_u}^t- \varvec{e_{\widehat{u}}},((\underline{\psi }-\underline{\Pi _\Sigma }\underline{\psi })\varvec{n})^t\rangle _{\partial \mathcal {T}_h}\\&\quad \le C\left\| \tau ^{\frac{1}{2}}(\varvec{e_u}^t- \varvec{e_{\widehat{u}}})\right\| _{L^2(\partial \mathcal {T}_h)} \left\| \tau ^{-\frac{1}{2}}((\underline{\psi }-\underline{\Pi _\Sigma }\underline{\psi })\varvec{n})^t\right\| _{L^2(\partial \mathcal {T}_h)}\\&\quad \le C\Big (\Vert \varepsilon (\varvec{e_u})\Vert _{L^2(\mathcal {T}_h)}+\mu ^{-\frac{1}{2}} \left\| \tau ^{\frac{1}{2}}(\varvec{P_M}\varvec{e_u}^t-\varvec{e_{\widehat{u}}})\right\| _{L^2(\partial \mathcal {T}_h)}\Big ) \Vert \varvec{\psi }\Vert _{H^1(\Omega )}. \end{aligned}$$

We can infer from \((\textrm{tr}(\underline{\sigma }-\underline{\sigma }_h)\underline{I}\varvec{n})^t=\varvec{0}\), the trace inequality (4.6), (4.4)–(4.5) and Theorem 2.3 that

$$\begin{aligned}&\langle (\underline{\sigma }\varvec{n})^t-(\underline{\sigma }_h\varvec{n})^t,(\varvec{\Pi _U}\varvec{\phi })^t-\varvec{\phi }^t\rangle _{\partial \mathcal {T}_h\backslash \partial \Omega _D}\\&\quad = \langle (\mathcal {A}\underline{\sigma }\varvec{n})^t-(\mathcal {A}\underline{\sigma }_h\varvec{n})^t,(\varvec{\Pi _U}\varvec{\phi })^t-\varvec{\phi }^t\rangle _{\partial \mathcal {T}_h\backslash \partial \Omega _D}\\&\quad = \langle (\mathcal {A}\underline{\sigma }\varvec{n})^t-(\Pi _\Sigma \mathcal {A}\underline{\sigma }\varvec{n})^t,(\varvec{\Pi _U}\varvec{\phi })^t-\varvec{\phi }^t\rangle _{\partial \mathcal {T}_h\backslash \partial \Omega _D}\\&\qquad + \langle (\Pi _\Sigma \mathcal {A}\underline{\sigma }\varvec{n})^t-(\mathcal {A}\underline{\sigma }_h\varvec{n})^t,(\varvec{\Pi _U}\varvec{\phi })^t-\varvec{\phi }^t\rangle _{\partial \mathcal {T}_h\backslash \partial \Omega _D}\\&\quad \le C \Big (h^{t+1}\mu ^{-1}|\mathcal {A}\underline{\sigma }|_{H^t(\Omega )} + h^s |\varvec{u}|_{H^s(\Omega )}\Big ) \mu \Vert \varvec{\phi }\Vert _{H^2(\Omega )}. \end{aligned}$$

The Cauchy–Schwarz inequality yields

$$\begin{aligned} \langle \tau (\varvec{P_M} \varvec{u}_h^t-\varvec{\widehat{u}}_h),(\varvec{\Pi _U}\varvec{\phi }-\varvec{\phi })^t\rangle _{\partial \mathcal {T}_h}\le C h \mu ^{\frac{1}{2}}\Vert \tau ^{\frac{1}{2}} (\varvec{P_M} \varvec{u}_h^t-\varvec{\widehat{u}}_h)\Vert _{L^2(\partial \mathcal {T}_h)}\Vert \varvec{\phi }\Vert _{H^2(\Omega )}. \end{aligned}$$

Similarily, we have

$$\begin{aligned}&((2\mu )^{-1}\mathcal {A}(\underline{\psi }-\underline{\Pi _\Sigma } \underline{\psi }), \underline{e}_\sigma )_{\mathcal {T}_h}\\&\quad = ((2\mu )^{-1}(\mathcal {A}\underline{\psi }-\underline{\Pi _\Sigma } (\mathcal {A} \underline{\psi })), \underline{e_\sigma })_{\mathcal {T}_h}\\&\quad \le \Vert \mathcal {A}\underline{\psi }-\underline{\Pi _\Sigma } (\mathcal {A} \underline{\psi })\Vert _{L^2(\mathcal {T}_h)}\Vert \mu ^{-1}\mathcal {A}\underline{e_\sigma }\Vert _{L^2(\mathcal {T}_h)}\\&\quad \le C h \Vert \underline{\psi }\Vert _{H^1(\Omega )}\Vert \mu ^{-1}\mathcal {A}\underline{e_\sigma }\Vert _{L^2(\mathcal {T}_h)} \end{aligned}$$

and

$$\begin{aligned} (\mathcal {A}\underline{e_\sigma },\varepsilon (\varvec{\phi }-\varvec{\Pi _U}\varvec{\phi }))_{\mathcal {T}_h}&\le \Vert \mathcal {A}\underline{e_\sigma }\Vert _{L^2(\mathcal {T}_h)}\Vert \varepsilon (\varvec{\phi }-\varvec{\Pi _U}\varvec{\phi })\Vert _{L^2(\mathcal {T}_h)}\\&\le C\mu ^{-1} h\Vert \mathcal {A}\underline{e_\sigma }\Vert _{L^2(\mathcal {T}_h)}\mu \Vert \varvec{\phi }\Vert _{H^2(\Omega )} . \end{aligned}$$

We can infer from the Cauchy–Schwarz inequality and the interpolation error estimate (4.5) that

$$\begin{aligned} (\underline{\psi }-\underline{\Pi _\Sigma } \underline{\psi }, \varepsilon (\varvec{u}-\varvec{\Pi _U}\varvec{u}))_{\mathcal {T}_h}&\le C h \Vert \underline{\psi }\Vert _{H^1(\Omega )}\Vert \varepsilon (\varvec{u}-\varvec{\Pi _U}\varvec{u})\Vert _{L^2(\mathcal {T}_h)}\\ {}&\le C h^{s} \Vert \underline{\psi }\Vert _{H^1(\Omega )}|\varvec{u}|_{H^{s}(\Omega )}. \end{aligned}$$

The Cauchy–Schwarz inequality, the trace inequality (4.6) and the error estimates (4.4)–(4.5) yield

$$\begin{aligned} \langle (\underline{\psi }-\underline{\Pi _\Sigma } \underline{\psi })\varvec{n}, \varvec{u}-\varvec{\Pi _{U}}\varvec{u}\rangle _{\partial \mathcal {T}_h}\le Ch^{s} \Vert \underline{\psi }\Vert _{H^1(\Omega )} |\varvec{u}|_{H^{s}(\Omega )}. \end{aligned}$$

Combining the preceding estimates with Theorem 2.3 completes the proof. \(\square \)

5 Convergence of the Navier–Stokes equations

In this section, we prove Theorem 2.6 on the convergence to the weak solution under minimal regularity assumption.

We first recall the following inequalities, which will play an important role for later analysis.

Lemma 5.1

There exists C independent of h such that for all \(F\subset \partial K, K\in \mathcal {T}_h\) and \(1\le p,q\le \infty \)

$$\begin{aligned} \Vert \varvec{v}\Vert _{L^p(K)}\le Ch_K^{d(1/p-1/q)}\Vert \varvec{v}\Vert _{L^q(K)}\quad \forall \varvec{v}\in \varvec{U}_h. \end{aligned}$$
(5.1)

Lemma 5.2

There exists C independent of h such that

$$\begin{aligned} \Vert \varvec{v}\Vert _{L^q(\Omega )}\le C\Vert \varvec{v}\Vert _h\quad 1\le q\le 6,\;\forall \varvec{v}\in \varvec{U}_h. \end{aligned}$$

Moreover, the non-negativity and the boundedness are stated in the next two lemmas, one can refer to [10] for the proof.

Lemma 5.3

(Non-negativity) Let \(\varvec{w}\in \varvec{H}^1_0(\Gamma _D)+ (\varvec{U}_h\cap \varvec{H}(\textrm{div};\Omega ))\) with \(\nabla \cdot \varvec{w}=0\). Then we have

$$\begin{aligned} N_h(\varvec{w};(\varvec{v},\widehat{\varvec{v}}),(\varvec{v},\widehat{\varvec{v}}))\ge 0\quad \forall \varvec{v}\in \varvec{U}_h+\varvec{H}^1_0(\Gamma _D), \widehat{\varvec{v}}\in \widehat{\varvec{U}}_h. \end{aligned}$$

Lemma 5.4

(Boundedness) For any \(\varvec{z}_h,\varvec{v}_h,\varvec{w}_h\in \varvec{U}_h\) and \(\widehat{\varvec{v}}_h,\widehat{\varvec{w}}_h\in \widehat{\varvec{U}}_h\), it holds

$$\begin{aligned} N_h(\varvec{z}_h;(\varvec{v}_h,\widehat{\varvec{v}}_h),(\varvec{w}_h,\widehat{\varvec{w}}_h))\le C\left\| \varvec{z}_h\right\| _h\left\| (\varvec{v}_h,\widehat{\varvec{v}}_h)\right\| _{1,h}\left\| (\varvec{w}_h,\widehat{\varvec{w}}_h)\right\| _{1,h}. \end{aligned}$$

A solution operator \(T_h:\varvec{U}_h\rightarrow \varvec{U}_h\) is defined as follows: For given \(\varvec{z}_h\in \varvec{U}_h\), find \(\varvec{w}_h=T_h(\varvec{z}_h)\in \varvec{U}_h\times \varvec{\widehat{U}}_h\) such that

$$\begin{aligned}{} & {} \mathbb {A}_h((\underline{S}_h,\varvec{w}_h,\widehat{\varvec{w}}_h),(\varvec{H},\varvec{v},\widehat{\varvec{v}}))+N_h(\varvec{z}_h;(\varvec{w}_h,\widehat{\varvec{w}}_h),(\varvec{v},\widehat{\varvec{v}}))\nonumber \\{} & {} \quad =(\varvec{f}_N,\varvec{v})\quad \forall (\varvec{H},\varvec{v},\widehat{\varvec{v}})\in \underline{\Sigma }_h\times \varvec{U}_h\times \widehat{\varvec{U}}_h \end{aligned}$$
(5.2)

for some \((\underline{S}_h,\widehat{\varvec{w}}_h)\in \underline{\Sigma }_h\times \widehat{\varvec{U}}_h\).

Observe that finding the solution to (2.13) is equivalent to finding a fixed-point \(\varvec{u}_h\) of \(T_h\) so that

$$\begin{aligned} T_h(\varvec{u}_h)=\varvec{u}_h \end{aligned}$$

with its corresponding \(\underline{S}_h\) and \(\widehat{\varvec{w}}_h\).

Then, we have the following stability estimate.

Lemma 5.5

For any \(\varvec{z}_h\in \varvec{U}_h\), we have

$$\begin{aligned} \left\| T_h(\varvec{z}_h)\right\| _h=\left\| \varvec{w}_h\right\| _h\le C\mu ^{-1}\left\| \varvec{f}_0\right\| _{L^2(\Omega )}, \end{aligned}$$

where \(\varvec{f}_0\) is from the Helmholtz-Hodge decomposition \(\varvec{f}_N=\varvec{f}_0+\nabla \chi \) with \(\varvec{f}_0\in \varvec{H}(\textrm{div};\Omega ),\nabla \cdot \varvec{f}_0=0\) and \(\chi \in L^2_0(\Omega )\cap H^1(\Omega )\).

Proof

Let \(\varvec{z}_h\in \varvec{U}_h\) and \(\varvec{w}_h=T_h(\varvec{z}_h)\). Then there exists \(\varsigma =(\underline{G}_h,\varvec{w}_h,\widehat{\varvec{w}}_h)\) such that

$$\begin{aligned} \mathbb {A}_h(\varsigma ,\varphi )+N_h(\varvec{z}_h;(\varvec{w}_h,\widehat{\varvec{w}}_h),(\varvec{v},\widehat{\varvec{v}}))=(\varvec{f}_N,\varvec{v})\quad \forall \varphi =(\underline{H},\varvec{v},\widehat{\varvec{v}})\in \underline{\Sigma }_h\times \varvec{U}_h\times \widehat{\varvec{U}}_h. \end{aligned}$$

Taking \(\varphi =\varsigma \), we obtain using Lemma 5.3

$$\begin{aligned}{} & {} \mu ^{-1}\left\| \mathcal {A}\underline{G}_h\right\| _{L^2(\Omega )}^2+\Vert \tau ^{1/2}(\varvec{P_M}\varvec{w}_h^t-\widehat{\varvec{w}}_h)\Vert _{L^2(\partial \mathcal {T}_h)}^2\\{} & {} \quad \le \mathbb {A}_h(\varsigma ,\varsigma )+N_h(\varvec{z}_h;(\varvec{w}_h,\widehat{\varvec{w}}_h),(\varvec{w}_h,\widehat{\varvec{w}}_h))=(\varvec{f}_N,\varvec{w}_h). \end{aligned}$$

Then the Helmholtz-Hodge decomposition \(\varvec{f}_N=\varvec{f}_0+\nabla \chi \) and the fact that \(\nabla \cdot \varvec{w}_h=0\) yield

$$\begin{aligned}{} & {} \mu ^{-1}\left\| \mathcal {A}\underline{G}_h\right\| _{L^2(\Omega )}^2+\Vert \tau ^{1/2}(\varvec{P_M}\varvec{w}_h^t-\widehat{\varvec{w}}_h)\Vert _{L^2(\partial \mathcal {T}_h)}^2\\{} & {} \quad \le C\Vert \varvec{f}_0\Vert _{L^2(\Omega )}\Vert \varvec{w}_h\Vert _{L^2(\Omega )}. \end{aligned}$$

Proceeding analogously to (2.15), we can infer from the discrete Poincaré inequality and (2.8)

$$\begin{aligned} \Vert \varvec{w}_h\Vert _h\le & {} C \left\| (\varvec{w}_h,\widehat{\varvec{w}}_h)\right\| _{1,h}\le C\Big ( \mu ^{-1}\left\| \mathcal {A}\underline{G}_h\right\| _{L^2(\Omega )}\\{} & {} +\mu ^{-1/2}\Vert \tau ^{\frac{1}{2}}(\varvec{P_M}\varvec{w}_h^t-\widehat{\varvec{w}}_h)\Vert _{L^2(\partial \mathcal {T}_h)}\Big )\le C\mu ^{-1} \Vert \varvec{f}_0\Vert _{L^2(\Omega )}. \end{aligned}$$

Therefore, the proof is completed. \(\square \)

Lemma 5.6

Let \(\{(\underline{\sigma }_h,\varvec{u}_h,\widehat{\varvec{u}}_h){\}}_{h>0}\) be the sequence of solutions obtained from solving (2.13), then it holds

$$\begin{aligned} \Vert \underline{\sigma }_h\Vert _{L^2(\Omega )}&\le C\Big ( \Vert \varvec{f}_N\Vert _{L^2(\mathcal {T}_h)}+\mu ^{-2}\Vert \varvec{f}_0\Vert _{L^2(\mathcal {T}_h)}^2\Big ),\\ \Vert \mathcal {A}\underline{\sigma }_h\Vert _{L^2(\Omega )}&\le C \Vert \varvec{f}_0\Vert _{L^2(\mathcal {T}_h)},\\ \Vert (\varvec{u}_h,\widehat{\varvec{u}}_h)\Vert _{1,h}&\le C\mu ^{-1}\Vert \varvec{f}_0\Vert _{L^2(\mathcal {T}_h)}. \end{aligned}$$

Proof

Proceeding in an analogous way to Lemma 5.5, we can obtain the upper bound for \( \Vert (\varvec{u}_h,\widehat{\varvec{u}}_h)\Vert _{1,h}\) and \(\Vert \mathcal {A}\underline{\sigma }_h\Vert _{L^2(\Omega )}\). It remains to estimate \(\Vert \underline{\sigma }_h\Vert _{L^2(\Omega )}\), which can be proved in a similar way to that of the Stokes equations with additional treatment for the convective trilinear form. In fact, we need to estimate \(\Vert \textrm{tr}(\underline{\sigma }_h)\Vert _{L^2(\Omega )}\), that is, we need to bound the right-hand side of (4.14).

There exists a function \(\varvec{\theta }\in \varvec{H}^1(\Omega )\) and \(\varvec{\theta }\cdot \varvec{n}=0\) on \(\partial \Omega _D\) (cf. [34]) such that

$$\begin{aligned} \begin{aligned} \nabla \cdot \varvec{\theta } = \textrm{tr}(\underline{\sigma }_h),\quad \Vert \varvec{\theta }\Vert _{H^1(\Omega )} \le C \Vert \textrm{tr}(\underline{\sigma }_h)\Vert _{L^2(\Omega )}. \end{aligned} \end{aligned}$$

In view of (4.14), it suffices to show the upper bound for \((\underline{\sigma }_h, \varepsilon (\varvec{\Pi _U}\varvec{\theta }))_{\mathcal {T}_h}\). It follows from (2.13) and Lemma 5.4 that

$$\begin{aligned} (\underline{\sigma }_h, \varepsilon (\varvec{\Pi _U}\varvec{\theta }))_{\mathcal {T}_h}&=\langle \mathcal {A}(\underline{\sigma }_h\varvec{n})^t, (\varvec{\Pi _U}\varvec{\theta })^t -\varvec{P_M}\varvec{\theta }^t\rangle _{\partial \mathcal {T}_h}\\ {}&\quad -\langle \tau (\varvec{P_M} \varvec{u}_h^t-\varvec{\widehat{u}}_h),(\varvec{\Pi _U}\varvec{\theta })^t-\varvec{P_M}\varvec{\theta }^t\rangle _{\partial \mathcal {T}_h}\\&\quad -N_h(\varvec{u}_h;(\varvec{u}_h,\widehat{\varvec{u}}_h),(\varvec{\Pi _U}\varvec{\theta },\varvec{P_M}\varvec{\theta }^t))+(\varvec{f}_N,\varvec{\Pi _U}\varvec{\theta })_{\mathcal {T}_h}\\&\le C \Big (\Vert \mathcal {A}\underline{\sigma }_h\Vert _{L^2(\mathcal {T}_h)}+\mu ^{\frac{1}{2}}\Vert \tau ^{\frac{1}{2}} (\varvec{P_M} \varvec{u}_h^t\\&-\varvec{\widehat{u}}_h)\Vert _{L^2(\partial \mathcal {T}_h)}{+\Vert (\varvec{u}_h,\widehat{\varvec{u}}_h)\Vert _{1,h}^2+\Vert \varvec{f}_N\Vert _{L^2(\mathcal {T}_h)}}\Big )\Vert \varvec{\theta }\Vert _{H^1(\Omega )}. \end{aligned}$$

Therefore, it holds owing to (4.14)

$$\begin{aligned}&\Vert \textrm{tr}(\underline{\sigma }_h)\Vert _{L^2(\Omega )}\le C\Big ( \Vert \mathcal {A}\underline{\sigma }_h\Vert _{L^2(\mathcal {T}_h)}+\mu ^{\frac{1}{2}}\Vert \tau ^{\frac{1}{2}} (\varvec{P_M} \varvec{u}_h^t-\varvec{\widehat{u}}_h)\Vert _{L^2(\partial \mathcal {T}_h)}\\&\qquad +\Vert \varvec{f}_N\Vert _{L^2(\mathcal {T}_h)}+\Vert (\varvec{u}_h,\widehat{\varvec{u}}_h)\Vert _{1,h}^2\Big )\\&\quad \le C \Big ( \Vert \mathcal {A}\underline{\sigma }_h\Vert _{L^2(\mathcal {T}_h)}+\Vert \varvec{f}_N\Vert _{L^2(\mathcal {T}_h)}+\mu ^{-2}\Vert \varvec{f}_0\Vert _{L^2(\mathcal {T}_h)}^2\Big ). \end{aligned}$$

Thus, the triangle inequality yields

$$\begin{aligned} \Vert \underline{\sigma }_h\Vert _{L^2(\Omega )}\le \Big (\Vert \mathcal {A}\underline{\sigma }_h\Vert _{L^2(\Omega )}+\Vert \textrm{tr}(\underline{\sigma }_h)\Vert _{L^2(\Omega )}\Big )\le C \Big (\Vert \varvec{f}_N\Vert _{L^2(\mathcal {T}_h)}+\mu ^{-2}\Vert \varvec{f}_0\Vert _{L^2(\mathcal {T}_h)}^2\Big ). \end{aligned}$$

\(\square \)

By Lemma 5.5 and the Brouwer fixed point theorem, the existence of the fixed-point \(\varvec{u}_h=T_h(\varvec{u}_h)\) is guaranteed. Now we are ready to show the convergence to the weak solution. To facilitate later analysis, we define the lifting operator \(\underline{R}\): \(\varvec{L}^2(\partial \mathcal {T}_h)\rightarrow \underline{\Sigma }_h\) by

$$\begin{aligned} \int _{\Omega } \underline{R}(\varvec{\psi }) \underline{w}\;dx=\langle \varvec{\psi }, (\underline{w}\varvec{n})^t\rangle _{\partial \mathcal {T}_h}\quad \forall \underline{w}\in \underline{\Sigma }_h. \end{aligned}$$
(5.3)

Let \(q\in P_h\), the following holds in view of (5.3) and (2.14)

$$\begin{aligned} (\textrm{tr}(\varepsilon (\varvec{u}_h)-\underline{R}(\varvec{P_M}\varvec{u}_h^t-\varvec{\widehat{u}}_h)), q)_{\mathcal {T}_h}&=\sum _{K\in \mathcal {T}_h}(\nabla \cdot \varvec{u}_h,q)_K-(\underline{R}(\varvec{P_M}(\varvec{u}_h^t)-\varvec{\widehat{u}}_h), q\underline{I})_{\mathcal {T}_h}\\&\quad =\sum _{K\in \mathcal {T}_h}(\nabla \cdot \varvec{u}_h,q)_K=0. \end{aligned}$$

Let \((\varepsilon _h(\varvec{u}_h))_{|K}:=\varepsilon ((\varvec{u}_h)_{|K})\). Therefore, \( \textrm{tr}(\varepsilon _h(\varvec{u}_h)-\underline{R}(\varvec{P_M}\varvec{u}_h^t-\varvec{\widehat{u}}_h))=0 \). Notice that \(\varepsilon _h(\varvec{u}_h)-\underline{R}(\varvec{P_M}\varvec{u}_h^t-\varvec{\widehat{u}}_h)\in \underline{\Sigma }_h \) and let \(\underline{G}_h(\varvec{u}_h,\widehat{\varvec{u}}_h)=\varepsilon _h(\varvec{u}_h)-\underline{R}(\varvec{P_M}\varvec{u}_h^t-\varvec{\widehat{u}}_h)\), then we can infer from (2.13) and integration by parts that

$$\begin{aligned} (2\mu )^{-1}\mathcal {A}\underline{\sigma }_h = \underline{G}_h(\varvec{u}_h,\widehat{\varvec{u}}_h) . \end{aligned}$$
(5.4)

The lifting operator satisfies the following estimate.

Lemma 5.7

The following error estimate holds

$$\begin{aligned} \Vert \underline{R}(\varvec{P_M}\varvec{u}_h^t-\varvec{\widehat{u}}_h)\Vert _{L^2(\mathcal {T}_h)}\le C \mu ^{-\frac{1}{2}} \Bigg (\sum _{K\in \mathcal {T}_h} \left\| \tau ^{\frac{1}{2}}(\varvec{P_M}\varvec{u}_h^t-\varvec{\widehat{u}}_h)\right\| _{L^2(\partial K)}^2\Bigg )^{\frac{1}{2}}. \end{aligned}$$

Proof

We can deduce from (5.3), the Cauchy–Schwarz inequality and the trace inequality (4.7) that

$$\begin{aligned}&\Vert \underline{R}(\varvec{P_M}\varvec{u}_h^t-\varvec{\widehat{u}}_h)\Vert _{L^2(\mathcal {T}_h)}^2\\&\quad =(\underline{R}(\varvec{\varvec{P_M}}\varvec{u}_h^t-\varvec{\widehat{u}}_h),\underline{R}(\varvec{P_M}\varvec{u}_h^t-\varvec{\widehat{u}}_h))_{\mathcal {T}_h}\\&\quad =\langle \varvec{P_M}\varvec{u}_h^t-\varvec{\widehat{u}}_h,(\underline{R}(\varvec{P_M}\varvec{u}_h^t-\varvec{\widehat{u}}_h)\varvec{n})^t\rangle _{\partial \mathcal {T}_h}\\&\quad \le C \sum _{K\in \mathcal {T}_h}h_K^{-1/2} \Vert \varvec{P_M}\varvec{u}_h^t-\varvec{\widehat{u}}_h\Vert _{L^2(\partial K)}\Vert \underline{R}(\varvec{P_M}\varvec{u}_h^t-\varvec{\widehat{u}}_h)\Vert _{L^2( K)}\\&\quad \le C \mu ^{-\frac{1}{2}} \Bigg (\sum _{K\in \mathcal {T}_h} \Vert \tau ^{\frac{1}{2}}(\varvec{P_M}\varvec{u}_h^t-\varvec{\widehat{u}}_h)\Vert _{L^2(\partial K)}^2\Bigg )^{\frac{1}{2}}\Vert \underline{R}(\varvec{P_M}\varvec{u}_h^t-\varvec{\widehat{u}}_h)\Vert _{L^2(\mathcal {T}_h)}, \end{aligned}$$

which implies the desired estimate. \(\square \)

In addition, we also define the lifting operator \(\underline{R}^s:L^2(\partial \mathcal {T}_h)\rightarrow P_k(\mathcal {T}_h)^{d\times d}\) by

$$\begin{aligned} \int _{\Omega } \underline{R}^s(\varvec{\psi }) \underline{w}\;dx=\langle \varvec{\psi }, \underline{w}\varvec{n}\rangle _{\partial \mathcal {T}_h\backslash \partial \Omega _D}\quad \forall \underline{w}\in P_k(\mathcal {T}_h)^{d\times d}. \end{aligned}$$

Then a discrete gradient operator is defined as

$$\begin{aligned} \underline{\mathcal {G}}_h^k(\varvec{v},\widehat{\varvec{v}}):=\nabla _h\varvec{v}-\underline{R}^s(\varvec{v}^t-\widehat{\varvec{v}}), \end{aligned}$$

where \(\nabla _h\) represents the element-wise gradient operator.

An application of integration by parts implies that

$$\begin{aligned} N_h(\varvec{u}_h;(\varvec{u}_h,\widehat{\varvec{u}}_h),(\varvec{v}_h,\widehat{\varvec{v}}_h))&=(\varvec{u}_h\cdot \underline{\mathcal {G}}_h^{2k}(\varvec{u}_h,\widehat{\varvec{u}}_h),\varvec{v}_h)_{\mathcal {T}_h}\nonumber \\&\quad +\frac{1}{2}\left\langle |\varvec{u}_h\cdot \textbf{n}|+\varvec{u}_h\cdot \varvec{n},(\varvec{u}_h^t-\widehat{\varvec{u}}_h)\cdot (\varvec{v}_h^t-\widehat{\varvec{v}}_h)\right\rangle _{\partial \mathcal {T}_h\backslash \partial \Omega _D}.\nonumber \\ \end{aligned}$$
(5.5)

Theorem 5.1

(Compactness) Let \(1\le p<\infty \). Let \(\{(\varvec{v}_h,\widehat{\varvec{v}}_h)\}_{h>0}\) be a sequence in \(\varvec{U}_h\times \widehat{\varvec{U}}_h\) bounded in \(\Vert (\cdot ,\cdot )\Vert _{1,h}\)-norm. Then the sequence \(\varvec{v}_h\) is relatively compact in \(\varvec{L}^q(\Omega ),1\le q\le 6\).

Proof

We can prove the theorem following Lemma 6.2 and Theorem 6.2 in [16], and the boundedness (2.8). The details are omitted for simplicity. \(\square \)

Lemma 5.8

Let \(\{(\varvec{v}_h,\widehat{\varvec{v}}_h)\}_{h>0}\) be a sequence in \(\varvec{U}_h\times \widehat{\varvec{U}}_h\). Assume that the sequence \(\{(\varvec{v}_h,\widehat{\varvec{v}}_h)\}_{h>0}\) is bounded in \(\Vert \cdot \Vert _{1,h}\)-norm. Then, there exists a function \(\varvec{v}\in \varvec{H}^1_0(\Gamma _D)\) such that as \(h\rightarrow 0\), up to a subsequence, \(\varvec{v}_h\rightarrow \varvec{v}\) strongly in \(\varvec{L}^2(\Omega )\) and \(\underline{G}_h(\varvec{v}_h,\widehat{\varvec{v}}_h)\rightharpoonup \varepsilon (\varvec{v})\) weakly in \(\underline{L}^2(S,\Omega )\).

Proof

Owing to Theorem 5.1 applied with \(q=2\), there exists a function \(\varvec{v}\in \varvec{L}^2(\Omega )\) such that up to a subsequence, \(\varvec{v}_h\rightarrow \varvec{v}\) strongly in \(\varvec{L}^2(\Omega )\). Moreover, \(\underline{G}_h(\varvec{v}_h,\widehat{\varvec{v}}_h)\) is bounded in \(\varvec{L}^2\)-norm owing to Lemma 5.7. Thus, up to a new subsequence, there is \(\underline{w}\in \underline{L}^2(S,\Omega )\) such that \(G_h(\varvec{v}_h,\widehat{\varvec{v}}_h)\rightharpoonup \underline{w}\) weakly in \(\underline{L}^2(S,\Omega )\).

Let \(\underline{\psi }\in \underline{C}^{\infty }(S,\Omega )\cap \underline{C}(\bar{\Omega })\), it holds in view of (5.3) and the definition of \(\varvec{P_M}\) that

$$\begin{aligned} -\langle \varvec{v}_h^t, (\underline{\Pi _\Sigma } \underline{\psi }\varvec{n})^t\rangle _{\partial \mathcal {T}_h}&=-\langle \varvec{P_M}\varvec{v}_h^t,( \underline{\Pi _\Sigma } \underline{\psi }\varvec{n})^t\rangle _{\partial \mathcal {T}_h}=-\int _{\Omega } \underline{R}(\varvec{P_M}\varvec{v}_h^t)\underline{\Pi }_\Sigma \underline{\psi }\;dx,\\ \langle \widehat{\varvec{v}}_h, (\underline{\Pi }_\Sigma \underline{\psi }\varvec{n})^t\rangle _{\partial \mathcal {T}_h}&=\int _{\Omega } \underline{R}(\widehat{\varvec{v}}_h)\underline{\Pi }_\Sigma \underline{\psi }\;dx. \end{aligned}$$

Thanks to the fact that \(\widehat{\varvec{v}}_h\) is single valued and \(\widehat{\varvec{v}}_h=\varvec{0}\) on \(\partial \Omega _D\), we can infer that

$$\begin{aligned} \langle \widehat{\varvec{v}}_h, (\underline{\psi }\varvec{n})^t\rangle _{\partial \mathcal {T}_h}= \langle \widehat{\varvec{v}}_h, (\underline{\psi }\varvec{n})^t\rangle _{\partial \Omega _N}. \end{aligned}$$

Then using the fact that \(\varvec{v}_h^n=0\) on \(\partial \Omega _D\), we can deduce that

$$\begin{aligned}&(\underline{G}_h(\varvec{v}_h,\widehat{\varvec{v}}_h), \underline{\psi })_{\mathcal {T}_h}\\&\quad =( \varepsilon _h(\varvec{v}_h),\underline{\psi })_{\mathcal {T}_h} -(\underline{R}(\varvec{P_M}\varvec{v}_h^t-\widehat{\varvec{v}}_h),\underline{\psi })_{\mathcal {T}_h}\\&\quad =\langle \varvec{v}_h,\underline{\psi }\varvec{n}\rangle _{\partial \mathcal {T}_h}-(\varvec{v}_h,\textrm{div}\underline{\psi })_{\mathcal {T}_h}-(\underline{R}(\varvec{P_M} \varvec{v}_h^t-\widehat{\varvec{v}}_h),\underline{\psi })_{\mathcal {T}_h}\\&\quad =\langle \varvec{v}_h^t,(\underline{\psi }\varvec{n})^t\rangle _{\partial \mathcal {T}_h}{+\langle \varvec{v}_h^n,(\underline{\psi }\varvec{n})^n\rangle _{\partial \Omega _N}}-(\varvec{v}_h,\textrm{div}\underline{\psi })_{\mathcal {T}_h}-(\underline{R}(\varvec{P_M}\varvec{v}_h^t-\widehat{\varvec{v}}_h),\underline{\psi })_{\mathcal {T}_h}\\&\quad =\langle \varvec{v}_h^t-\widehat{\varvec{v}}_h,((\underline{\psi }-\underline{\Pi _\Sigma }\underline{\psi })\varvec{n})^t\rangle _{\partial \mathcal {T}_h}-(\underline{R}(\varvec{P_M}\varvec{v}_h^t-\widehat{\varvec{v}}_h),\underline{\psi }-\underline{\Pi _\Sigma }\underline{\psi })_{\mathcal {T}_h}\\&\qquad -(\varvec{v}_h,\textrm{div}\underline{\psi })_{\mathcal {T}_h}+{\langle \varvec{v}_h^n,(\underline{\psi }\varvec{n})^n\rangle _{\partial \Omega _N}}+{\langle \widehat{\varvec{v}}_h,(\underline{\psi }\varvec{n})^t\rangle _{\partial \Omega _N}}\\&\quad {=T_1+T_2+T_3+T_4+T_5}. \end{aligned}$$

The Cauchy–Schwarz inequality and the boundedness of \(\Vert h^{-1/2}(\varvec{v}_h^t-\widehat{\varvec{v}}_h)\Vert _{L^2(\partial \mathcal {T}_h)}\) yield

$$\begin{aligned} |T_1|&\le C \Vert h^{-1/2}(\varvec{v}_h^t-\widehat{\varvec{v}}_h)\Vert _{L^2(\partial \mathcal {T}_h)}h^{1/2}\Vert \underline{\psi }-\underline{\Pi _\Sigma }\underline{\psi }\Vert _{L^2(\partial \mathcal {T}_h)} \le C h^{1/2}\Vert \underline{\psi }\\&\quad -\underline{\Pi _\Sigma }\underline{\psi }\Vert _{L^2(\partial \mathcal {T}_h)}, \end{aligned}$$

which tends to zero as \(h\rightarrow 0\).

Similarily, the Cauchy–Schwarz inequality and Lemma 5.7 imply

$$\begin{aligned} | T_2|\le C\Vert h^{-1/2}(\varvec{P_M}\varvec{v}_h^t-\widehat{\varvec{v}}_h)\Vert _{L^2(\partial \mathcal {T}_h)} \Vert \underline{\psi }-\underline{\Pi _\Sigma }\underline{\psi }\Vert _{L^2(\Omega )} , \end{aligned}$$

which tends to zero as \(h\rightarrow 0\).

Moreover, \(T_3\rightarrow -\int _{\Omega } \varvec{v}\textrm{div}\underline{\psi }\). The Cauchy–Schwarz inequality and the boundedness of \(\Vert h^{-1/2}(\varvec{v}_h^t-\widehat{\varvec{v}}_h)\Vert _{L^2(\partial \mathcal {T}_h)}\) give

$$\begin{aligned} \langle \widehat{\varvec{v}}_h-\varvec{v}_h^t,(\underline{\psi }\varvec{n})^t\rangle _{\partial \Omega _N}&\le \Vert h^{-1/2}(\varvec{v}_h^t-\widehat{\varvec{v}}_h)\Vert _{L^2(\partial \mathcal {T}_h)}h^{1/2}\Vert \underline{\psi }\Vert _{L^2(\partial \mathcal {T}_h)}\\ {}&\le C h^{1/2}\Vert \underline{\psi }\Vert _{L^2(\partial \mathcal {T}_h)}, \end{aligned}$$

which tends to zero as \(h\rightarrow 0\). Therefore, we have \( \langle \widehat{\varvec{v}}_h,(\underline{\psi }\varvec{n})^t\rangle _{\partial \Omega _N}\rightarrow \langle \varvec{v}_h^t,(\underline{\psi }\varvec{n})^t\rangle _{\partial \Omega _N}\). Taking \(\underline{\psi }\in \underline{C}_c^\infty (\Omega )\) arbitrarily, we can infer from the preceding arguments that

$$\begin{aligned} ( \underline{w},\underline{\psi })=\lim _{h\rightarrow 0} (\underline{G}_h(\varvec{v}_h,\widehat{\varvec{v}}_h), \underline{\psi })_{\mathcal {T}_h}=-( \varvec{v},\textrm{div}\underline{\psi }). \end{aligned}$$

Since \(\underline{\psi }\) is symmetric, it holds \(\underline{w}=\varepsilon (\varvec{v})\). Thereby, \(\varvec{v}\in \varvec{H}^1(\Omega )\).

Then we can take \(\underline{\psi }\in \underline{C}^\infty (S,\Omega )\cap \underline{C}(S,\bar{\Omega })\) arbitrarily, we have \( ( \underline{w},\underline{\psi })=\lim _{h\rightarrow 0} (\underline{G}_h(\varvec{v}_h,\widehat{\varvec{v}}_h), \underline{\psi })_{\mathcal {T}_h}=-( \varvec{v},\textrm{div}\underline{\psi })+ \langle \varvec{v}_h,\underline{\psi }\varvec{n}\rangle _{\partial \Omega _N}\). Then it follows from integration by parts that \( \langle \varvec{v},\underline{\psi }\varvec{n}\rangle _{\partial \Omega _D}=0\) for arbitrary \(\underline{\psi }\in \underline{C}^\infty (S,\Omega )\cap \underline{C}(S,\bar{\Omega })\) with the additional restriction that \(\underline{\psi }\varvec{n}=\varvec{0}\) on \(\partial \Omega _N\), which in turn implies that \(\varvec{v}=\varvec{0}\) on \(\partial \Omega _D\). Hence, it holds \(\varvec{v}\in \varvec{H}^1_0(\Gamma _D)\). \(\square \)

Proceeding similarily to Lemma 5.8, we can prove the following lemma.

Lemma 5.9

Let \(\{(\varvec{v}_h,\widehat{\varvec{v}}_h)\}_{h>0}\) be a sequence in \(\varvec{U}_h\times \widehat{\varvec{U}}_h\). Assume that the sequence \(\{(\varvec{v}_h,\widehat{\varvec{v}}_h)\}_{h>0}\) is bounded in the \(\Vert (\cdot ,\cdot )\Vert _{1,h}\)-norm. Then, there exists a function \(\varvec{v}\in \varvec{H}^1_0(\Gamma _D)\) such that as \(h\rightarrow 0\), up to a subsequence, \(\varvec{v}_h\rightarrow \varvec{v}\) strongly in \(\varvec{L}^2(\Omega )\) and \(\underline{\mathcal {G}}_h^k(\varvec{v}_h,\widehat{\varvec{v}}_h)\rightharpoonup \nabla \varvec{v}\) weakly in \(\underline{L}^2(\Omega )\).

We let \(\varvec{V}_h:=\{\varvec{v}\in \varvec{H}^1_0(\Gamma _D), \varvec{v}_{|K}\in \varvec{P}_{k+1}(K), \forall K\in \mathcal {T}_h\}\), we seek \(\varvec{\Pi _h^c}\varvec{u}\in \varvec{V}_h\) such that

$$\begin{aligned} (\nabla \varvec{\Pi _h^c}\varvec{u},\nabla \varvec{v}_h)=(\nabla \varvec{u},\nabla \varvec{v}_h)\quad \forall \varvec{v}_h\in \varvec{V}_h, \end{aligned}$$

which is well-posed by the Riesz representation theorem. It follows from the density argument that \(\Vert \nabla (\varvec{u}-\varvec{\Pi _h^c}\varvec{u})\Vert _{L^2(\mathcal {T}_h)}\rightarrow 0\) as \(h\rightarrow 0\).

Proof of Theorem 2.6 (Convergence to weak solution)

In view of Lemmas 5.6 and  5.8, up to a subsequence, there is \((\underline{\sigma },\varvec{u})\in \underline{L}^2(S,\Omega )\times \varvec{H}^1_0(\Gamma _D)\) such that \(\underline{\sigma }_h\rightharpoonup \underline{\sigma }\) weakly in \(\underline{L}^2(S,\Omega )\), \(\varvec{u}_h\rightarrow \varvec{u}\) strongly in \(\varvec{L}^2(\Omega )\), \(\varvec{u}_h\rightarrow \varvec{u}\) strongly in \(\varvec{L}^4(\Omega )\), \(\underline{G}_h(\varvec{u}_h,\widehat{\varvec{u}}_h)\rightharpoonup \varepsilon (\varvec{u})\) weakly in \(\underline{L}^2(S,\Omega )\), \(\mathcal {A}\underline{\sigma }_h\rightharpoonup \mathcal {A}\underline{\sigma }\) weakly in \(\underline{L}^2(S,\Omega )\).

Let \(\underline{\psi }\in \underline{C}_c^\infty (S,\Omega )\). Testing with \(\underline{\Pi _\Sigma }\varvec{\psi }\) for (2.13) yields

$$\begin{aligned}&((2\mu )^{-1}\mathcal {A}\underline{\sigma }, \underline{\psi })-(\varepsilon (\varvec{u}), \underline{\psi })\\&\quad = ((2\mu )^{-1}\mathcal {A}(\underline{\sigma }-\underline{\sigma }_h, \underline{\psi })+ ((2\mu )^{-1}\mathcal {A}\underline{\sigma }_h, \underline{\psi }\\&\qquad -\underline{\Pi _\Sigma } \underline{\psi })+((2\mu )^{-1}\mathcal {A}\underline{\sigma }_h,\underline{\Pi _\Sigma }\underline{\psi })\\&\qquad \;-(\underline{\psi },\varepsilon (\varvec{u})-\underline{G}_h(\varvec{u}_h,\widehat{\varvec{u}}_h))-(\underline{\psi }-\underline{\Pi _\Sigma } \underline{\psi },\underline{G}_h(\varvec{u}_h,\widehat{\varvec{u}}_h))\\&\qquad -(\underline{G}_h(\varvec{u}_h,\widehat{\varvec{u}}_h),\underline{\Pi _\Sigma } \underline{\psi })=\sum _{i=1}^6T_i. \end{aligned}$$

As \(h\rightarrow 0\), the weak convergence of \(\mathcal {A}\underline{\sigma }_h\), the boundedness of \(\Vert \mathcal {A}\underline{\sigma }_h\Vert _{L^2(\Omega )}\) and the strong convergence of \(\Pi _\Sigma \underline{\psi }\) imply that \(T_1\rightarrow 0\) and \(T_2\rightarrow 0\). Similarily, we can infer that \(T_4\rightarrow 0\) and \(T_5\rightarrow 0\) owing to the weak convergence of \(\underline{G}_h(\varvec{u}_h,\widehat{\varvec{u}}_h)\), the boundedness of \(\Vert \underline{G}_h(\varvec{u}_h,\widehat{\varvec{u}}_h)\Vert _{L^2(\Omega )}\) and the strong convergence of \(\underline{\Pi _\Sigma }\underline{\psi }\).

In view of (5.4), we have

$$\begin{aligned} T_3+T_6=((2\mu )^{-1}\mathcal {A}\underline{\sigma }_h,\underline{\Pi _\Sigma }\underline{\psi })-(\underline{G}_h(\varvec{u}_h,\widehat{\varvec{u}}_h),\underline{\Pi _\Sigma } \underline{\psi })=0. \end{aligned}$$

Therefore, it holds

$$\begin{aligned} ((2\mu )^{-1}\mathcal {A}\underline{\sigma }, \underline{\psi })-(\varepsilon (\varvec{u}), \underline{\psi })=0. \end{aligned}$$

On the other hand, let \(\varvec{\chi }\in \varvec{C}^\infty (\Omega )\cap \varvec{C}(\bar{\Omega })\), it follows from (2.13) that

$$\begin{aligned} \begin{aligned}&(\underline{\sigma }_h, \varepsilon (\varvec{\Pi _U}\varvec{\chi }))_{\mathcal {T}_h}-\langle (\underline{\sigma }_h\varvec{n})^t,(\varvec{\Pi _U}\varvec{\chi })^t-\varvec{P_M}\varvec{\chi }^t\rangle _{\partial \mathcal {T}_h}\\&\quad +\langle \tau (\varvec{P_M} \varvec{u}_h^t-\varvec{\widehat{u}}_h),(\varvec{\Pi _U}\varvec{\chi })^t-\varvec{P_M}\varvec{\chi }^t\rangle _{\partial \mathcal {T}_h}\\&\quad +N_h(\varvec{u}_h;(\varvec{u}_h,\widehat{\varvec{u}}_h),(\varvec{\Pi _U}\varvec{\chi },\varvec{P_M}\varvec{\chi }^t))=(\varvec{f}_N,\varvec{\Pi _U}\varvec{\chi })_{\mathcal {T}_h}. \end{aligned} \end{aligned}$$
(5.6)

The second term and the third term on the left-hand side can be bounded by

$$\begin{aligned} \begin{aligned}&|-\langle (\underline{\sigma }_h\varvec{n})^t,(\varvec{\Pi _U}\varvec{\chi })^t-\varvec{P_M}\varvec{\chi }^t\rangle _{\partial \mathcal {T}_h}+\langle \tau (\varvec{P_M} \varvec{u}_h^t-\varvec{\widehat{u}}_h),(\varvec{\Pi _U}\varvec{\chi })^t-\varvec{P_M}\varvec{\chi }^t\rangle _{\partial \mathcal {T}_h}|\\&\quad \le Ch\Big ( \Vert \underline{\sigma }_h\Vert _{L^2(\Omega )} \Vert \varvec{\chi }\Vert _{H^2(\Omega )}+\left\| \tau ^{\frac{1}{2}}(\varvec{P_M} \varvec{u}_h^t-\varvec{\widehat{u}}_h)\right\| _{L^2(\partial \mathcal {T}_h)}\Vert \varvec{\chi }\Vert _{H^2(\Omega )}\Big ), \end{aligned} \end{aligned}$$

which tends to zero as \(h\rightarrow 0\) owing to the boundedness of \(\Vert \underline{\sigma }_h\Vert _{L^2(\Omega )}\) and \(\Vert \tau ^{\frac{1}{2}}(\varvec{P_M} \varvec{u}_h^t-\varvec{\widehat{u}}_h)\Vert _{L^2(\partial \mathcal {T}_h)}\).

Moreover, we have

$$\begin{aligned} (\underline{\sigma }_h, \varepsilon (\varvec{\Pi _U}\varvec{\chi }))_{\mathcal {T}_h}\rightarrow (\underline{\sigma }, \varepsilon (\varvec{\chi }))_{\mathcal {T}_h} \;\text{ as }\; h\rightarrow 0 \end{aligned}$$

owing to the weak convergence of \(\underline{\sigma }_h\) and the strong convergence of \(\varepsilon _h(\varvec{\Pi _U}\varvec{\chi })\) (cf. [16]). It also holds \((\varvec{f}_N,\varvec{\Pi _U}\varvec{\chi })_{\mathcal {T}_h}\rightarrow (\varvec{f}_N,\varvec{\chi })_{\mathcal {T}_h}\) in view of the strong convergence of \(\varvec{\Pi _U}\varvec{\chi }\). It remains to estimate the last term on the left-hand side of (5.6). It follows from (5.5) that

$$\begin{aligned} \begin{aligned}&N_h(\varvec{u}_h;(\varvec{u}_h,\widehat{\varvec{u}}_h),(\varvec{\Pi _U}\varvec{\chi },\varvec{P_M}\varvec{\chi }^t)) \\&\quad =(\varvec{u}_h\cdot \underline{\mathcal {G}}_h^{2k}(\varvec{u}_h,\widehat{\varvec{u}}_h),\varvec{\Pi _U}\varvec{\chi })_{\mathcal {T}_h}\\&\qquad +\frac{1}{2}\left\langle |\varvec{u}_h\cdot \textbf{n}|+\varvec{u}_h\cdot \varvec{n},(\varvec{u}_h-\widehat{\varvec{u}}_h)\cdot ((\varvec{\Pi _U\chi })^t -\varvec{P_M}\varvec{\chi }^t)\right\rangle _{\partial \mathcal {T}_h\backslash \partial \Omega _D}. \end{aligned} \end{aligned}$$
(5.7)

Since \(\varvec{u}_h\rightarrow \varvec{u}\) in \(\varvec{L}^4(\Omega )\) and \(\varvec{\Pi _U}\varvec{\chi }\rightarrow \varvec{\chi }\) in \(\varvec{L}^4(\Omega )\), it is inferred that \(\varvec{u}_h\varvec{\Pi _U}\varvec{\chi }\rightarrow \varvec{u}\varvec{\chi }\) in \(\varvec{L}^2(\Omega )\). Moreover, \(\underline{\mathcal {G}}_h^{2k}(\varvec{u}_{h},\widehat{\varvec{u}}_h)\) converges to \(\nabla \varvec{u}\) weakly in \(\varvec{L}^2(\Omega )\). Thus, the first term on the right-hand side of (5.7) converges to \((\varvec{u}\cdot \nabla \varvec{u},\varvec{\chi })\). Now we estimate the second term on the right-hand side of (5.7). Let \(K_f\) indicate the element sharing the face F, then Hölder’s inequality and (5.1) yield

$$\begin{aligned}&\frac{1}{2}\left\langle |\varvec{u}_h\cdot \textbf{n}|+\varvec{u}_h\cdot \varvec{n},(\varvec{u}_h^t-\widehat{\varvec{u}}_h)\cdot ((\varvec{\Pi _U\chi })^t-\varvec{P_M}\varvec{\chi }^t)\right\rangle _{\partial \mathcal {T}_h\backslash \partial \Omega _D}\nonumber \\&\quad \le C\sum _{F\in \mathcal {F}_h} \Vert \varvec{u}_h\Vert _{L^4(F)}\Vert \varvec{u}_h^t-\widehat{\varvec{u}}_h\Vert _{L^4(F)}\Vert (\varvec{\Pi _U\chi })^t-\varvec{P_M}\varvec{\chi }^t\Vert _{L^2(F)}\nonumber \\&\quad \le C \sum _{F\in \mathcal {F}_h}h_K^{-1/4}\Vert \varvec{u}_h\Vert _{L^4(K_f)}h^{(1-d)/4} \Vert \varvec{u}_h^t-\widehat{\varvec{u}}_h\Vert _{L^2(F)}\Vert (\varvec{\Pi _U\chi })^t-\varvec{P_M}\varvec{\chi }^t\Vert _{L^2(F)}\nonumber \\&\quad \le C h^{(4-d)/4} \Vert \varvec{u}_h\Vert _{L^4(\Omega )}\Vert (\varvec{u}_h,\widehat{\varvec{u}}_h)\Vert _{1,h}\Vert \varvec{\chi }\Vert _{H^1(\Omega )}, \end{aligned}$$
(5.8)

which tends to zero.

Therefore, it holds

$$\begin{aligned} (\underline{\sigma },\varepsilon (\varvec{\chi }))+(\varvec{u}\cdot \nabla \varvec{u}, \varvec{\chi })=(\varvec{f}_N,\varvec{\chi })_{\mathcal {T}_h}. \end{aligned}$$

By density of \(\underline{C}_c^\infty (S,\Omega )\times (\varvec{C}^\infty (\Omega )\cap \varvec{C}(\bar{\Omega }))\) in \(\underline{L}^2(S,\Omega )\times \varvec{H}^1(\Omega )\), this shows that \((\underline{\sigma },\varvec{u})\) solves the Navier–Stokes equations (2.5)–(2.6). Since the solution to this problem is unique, the whole sequence \(\{(\underline{\sigma }_h,\varvec{u}_h)\}_{h>0}\) converges.

Now we show the strong convergence of \(\mathcal {A}\underline{\sigma }_h\). In view of (2.13), we have

$$\begin{aligned}&(2\mu )^{-1}\Vert \mathcal {A}\underline{\sigma }_h\Vert _{L^2(\Omega )}^2+\left\| \tau ^{\frac{1}{2}}(\varvec{P_M}\varvec{u}_h^t-\varvec{\widehat{u}}_h)\right\| _{L^2(\partial \mathcal {T}_h)}^2+N_h(\varvec{u}_h;(\varvec{u}_h,\widehat{\varvec{u}}_h),(\varvec{u}_h, \widehat{\varvec{u}}_h))=(\varvec{f}_N,\varvec{u}_h).\nonumber \\ \end{aligned}$$
(5.9)

According to the definition of \(N_h(\cdot ;(\cdot ,\cdot ),(\cdot ,\cdot ))\), we have

$$\begin{aligned} N_h(\varvec{u}_h;(\varvec{u}_h,\widehat{\varvec{u}}_h),(\varvec{u}_h, \widehat{\varvec{u}}_h))&=(\varvec{u}_h\cdot \underline{\mathcal {G}}_h^{2k}(\varvec{u}_h,\widehat{\varvec{u}}_h),\varvec{u}_h)_{\mathcal {T}_h}\\&\quad \;+\frac{1}{2}\left\langle |\varvec{u}_h\cdot \textbf{n}|+\varvec{u}_h\cdot \varvec{n},(\varvec{u}_h^t-\widehat{\varvec{u}}_h)\cdot (\varvec{u}_h^t-\widehat{\varvec{u}}_h)\right\rangle _{\partial \mathcal {T}_h\backslash \partial \Omega _D}. \end{aligned}$$

Proceeding similarily to (5.7), we can show that \((\varvec{u}_h\cdot \underline{\mathcal {G}}_h^{2k}(\varvec{u}_h,\widehat{\varvec{u}}_h),\varvec{u}_h)_{\mathcal {T}_h}\rightarrow (\varvec{u}\cdot \nabla \varvec{u},\varvec{u})\) as \(h\rightarrow 0\). We also notice that the second term is non-negative, that is,

$$\begin{aligned} \frac{1}{2}\left\langle |\varvec{u}_h\cdot \textbf{n}|+\varvec{u}_h\cdot \varvec{n},(\varvec{u}_h^t-\widehat{\varvec{u}}_h)\cdot (\varvec{u}_h^t-\widehat{\varvec{u}}_h)\right\rangle _{\partial \mathcal {T}_h\backslash \partial \Omega _D}\ge 0. \end{aligned}$$

Thus, it holds

$$\begin{aligned} \lim \sup (2\mu )^{-1}\Vert \mathcal {A}\underline{\sigma }_h\Vert _{L^2(\Omega )}^2&\le \lim \sup \Big ((\varvec{f}_N,\varvec{u}_h)-(\varvec{u}_h\cdot \underline{\mathcal {G}}_h^{2k}(\varvec{u}_h,\widehat{\varvec{u}}_h),\varvec{u}_h)_{\mathcal {T}_h}\Big )\\&=(\varvec{f}_N,\varvec{u})-(\varvec{u}\cdot \nabla \varvec{u},\varvec{u})=(2\mu )^{-1}\Vert \mathcal {A}\underline{\sigma }\Vert _{L^2(\Omega )}^2. \end{aligned}$$

On the other hand, owing to the weak convergence of \(\mathcal {A}\underline{\sigma }_h\), we have

$$\begin{aligned} (2\mu )^{-1}\Vert \mathcal {A}\underline{\sigma }\Vert _{L^2(\Omega )}^2\le \lim \inf (2\mu )^{-1}\Vert \mathcal {A}\underline{\sigma }_h\Vert _{L^2(\Omega )}^2. \end{aligned}$$

Thereby, \( (2\mu )^{-1}\Vert \mathcal {A}\underline{\sigma }_h\Vert _{L^2(\Omega )}^2\rightarrow (2\mu )^{-1}\Vert \mathcal {A}\underline{\sigma }\Vert _{L^2(\Omega )}^2\), which yields the strong convergence of \(\mathcal {A}\underline{\sigma }_h\) in \(\underline{L}^2(S,\Omega )\).

In view of (5.9), we have

$$\begin{aligned} \begin{aligned}&\left\| \tau ^{\frac{1}{2}}(\varvec{P_M}\varvec{u}_h^t-\varvec{\widehat{u}}_h)\right\| _{L^2(\partial \mathcal {T}_h)}^2\\ {}&\qquad + \frac{1}{2}\left\langle |\varvec{u}_h\cdot \textbf{n}|+\varvec{u}_h\cdot \varvec{n},(\varvec{u}_h^t-\widehat{\varvec{u}}_h)\cdot (\varvec{u}_h^t-\widehat{\varvec{u}}_h)\right\rangle _{\partial \mathcal {T}_h\backslash \partial \Omega _D}\\&\quad =(\varvec{f}_N,\varvec{u}_h)-(2\mu )^{-1}\Vert \mathcal {A}\underline{\sigma }_h\Vert _{L^2(\Omega )}^2-(\varvec{u}_h\cdot \underline{\mathcal {G}}_h^{2k}(\varvec{u}_h,\widehat{\varvec{u}}_h),\varvec{u}_h)_{\mathcal {T}_h}, \end{aligned} \end{aligned}$$
(5.10)

where \((\varvec{f}_N,\varvec{u}_h)\rightarrow (\varvec{f}_N,\varvec{u})\), \((2\mu )^{-1}\Vert \mathcal {A}\underline{\sigma }_h\Vert _{L^2(\Omega )}^2\rightarrow (2\mu )^{-1}\Vert \mathcal {A}\underline{\sigma }\Vert _{L^2(\Omega )}^2\) and \((\varvec{u}_h\cdot \underline{\mathcal {G}}_h^{2k}(\varvec{u}_h,\widehat{\varvec{u}}_h),\varvec{u}_h)_{\mathcal {T}_h}\rightarrow (\varvec{u}\cdot \nabla \varvec{u},\varvec{u})\) as \(h\rightarrow 0\). Thus the right hand-side of (5.10) tends to zero. Thereby, \( \Vert \tau ^{\frac{1}{2}}(\varvec{P_M}\varvec{u}_h^t-\varvec{\widehat{u}}_h)\Vert _{L^2(\partial \mathcal {T}_h)}\rightarrow 0\) as \(h\rightarrow 0\).

Lemma 5.7 and the strong convergence of \(\mathcal {A}\underline{\sigma }_h\) imply

$$\begin{aligned}&\Vert \varepsilon _h(\varvec{u}-\varvec{u}_h)\Vert _{L^2(\mathcal {T}_h)}\le (2\mu )^{-1}\Vert \mathcal {A}(\underline{\sigma }-\underline{\sigma }_h)\Vert _{L^2(\mathcal {T}_h)}\\&\quad + \Vert h^{-1/2}(\varvec{P_M}\varvec{u}_h^t-\widehat{\varvec{u}}_h)\Vert _{L^2(\partial \mathcal {T}_h)}\rightarrow 0 \quad \text{ as }\;h\rightarrow 0. \end{aligned}$$

For \(F\in \mathcal {F}_h\backslash \partial \Omega _N\) and \(\partial K_1\cap \partial K_2=F\), we have from (4.7) and Lemma 4.1

$$\begin{aligned}&\int _F |\llbracket \varvec{u}_h-\varvec{P_M}\varvec{u}_h\rrbracket |^2\;ds\\&\quad = \int _F \big |\llbracket \varvec{u}_h-\varvec{\Pi _h^c}\varvec{u}-\varvec{P_M}(\varvec{u}_h-\varvec{\Pi _h^c}\varvec{u})\rrbracket \big |^2\;ds\\&\quad =\int _F \big |\llbracket \varvec{u}_h-\varvec{\Pi _h^c}\varvec{u}+\underline{B}_K\varvec{x}+\varvec{b}_k-\varvec{P_M}(\varvec{u}_h-\varvec{\Pi _h^c}\varvec{u}+\underline{B}_K\varvec{x}+\varvec{b}_k)\rrbracket \big |^2\;ds\\&\quad \le \frac{1}{2}\sum _{i=1}^2\int _{F\cap \partial K_i}|\varvec{u}_h-\varvec{\Pi _h^c}\varvec{u}+\underline{B}_K\varvec{x}+\varvec{b}_k-\varvec{P_M}(\varvec{u}_h-\varvec{\Pi _h^c}\varvec{u}+\underline{B}_K\varvec{x}+\varvec{b}_k)|^2\;ds\\&\quad \le C\sum _{i=1}^2h_{K_i}^{\frac{1}{2}}\Vert \nabla (\varvec{u}_h-\varvec{\Pi _h^c}\varvec{u}+\underline{B}_K\varvec{x}+\varvec{b}_k)\Vert _{L^2( K_i)}\\&\quad \le C\sum _{i=1}^2 h_{K_i}^{\frac{1}{2}}\Vert \varepsilon (\varvec{u}_h-\varvec{\Pi _h^c}\varvec{u})\Vert _{L^2( K_i)}. \end{aligned}$$

Thus, summing over all the faces gives

$$\begin{aligned} \Bigg (\sum _{F\in \mathcal {F}_h\backslash \partial \Omega _N} h_F^{-1}\Vert \llbracket \varvec{u}_h-\varvec{P_M}\varvec{u}_h\rrbracket \Vert _{L^2(F)}^2\Bigg )^{\frac{1}{2}}\le C \Vert \varepsilon (\varvec{u}_h-\varvec{\Pi _h^c}\varvec{u})\Vert _{L^2( \mathcal {T}_h)}. \end{aligned}$$

The triangle inequality yields

$$\begin{aligned} \Vert \varepsilon (\varvec{u}_h-\varvec{\Pi _h^c}\varvec{u})\Vert _{L^2( \mathcal {T}_h)}&\le \Vert \varepsilon _h (\varvec{u}_h-\varvec{u})\Vert _{L^2(\mathcal {T}_h)}\\ {}&\quad +\Vert \varepsilon (\varvec{u}-\varvec{\Pi _h^c}\varvec{u})\Vert _{L^2(\mathcal {T}_h)}\rightarrow 0 \quad \text{ as }\;h\rightarrow 0 \end{aligned}$$

and

$$\begin{aligned} \begin{aligned} \Bigg (\sum _{F\in \mathcal {F}_h\backslash \partial \Omega _N} h_F^{-1}\Vert \llbracket \varvec{u}_h\rrbracket \Vert _{L^2(F)}^2\Bigg )^{\frac{1}{2}}&\le \Bigg (\sum _{F\in \mathcal {F}_h\backslash \partial \Omega _N} h_F^{-1}\Vert \llbracket \varvec{u}_h-\varvec{P_M}\varvec{u}_h\rrbracket \Vert _{L^2(F)}^2\Bigg )^{\frac{1}{2}}\\&\quad + \Vert h^{-1/2}(\varvec{P_M}\varvec{u}_h^t-\widehat{\varvec{u}}_h)\Vert _{L^2(\partial \mathcal {T}_h)}^2\Big )^{\frac{1}{2}}\rightarrow 0 \;\text{ as }\;h\rightarrow 0. \end{aligned} \end{aligned}$$
(5.11)

Then, proceeding similarily to (4.11) yields

$$\begin{aligned} \Vert \nabla _h (\varvec{u}_h-\varvec{u})\Vert _{L^2(\mathcal {T}_h)}&\le C \Big (\Vert \varepsilon _h(\varvec{u}_h-\varvec{u})\Vert _{L^2(\mathcal {T}_h)}\nonumber \\&\quad + \Big (\sum _{F\in \mathcal {F}_h\backslash \partial \Omega _N} h_F^{-1}\Vert \llbracket \varvec{u}_h\rrbracket \Vert _{L^2(F)}^2\Big )^{\frac{1}{2}}\Big )\nonumber \\&\quad \rightarrow 0\quad \text{ as }\;h\rightarrow 0. \end{aligned}$$
(5.12)

Now we can infer from (5.11) and (5.12) that \(\Vert \nabla _h(\varvec{u}-\varvec{u}_h)\Vert _{L^2(\mathcal {T}_h)}\rightarrow 0\) and \(\Big (\sum _{F\in \mathcal {F}_h\backslash \partial \Omega _N} h_F^{-1}\Vert \llbracket \varvec{u}_h\rrbracket \Vert _{L^2(F)}^2\Big )^{\frac{1}{2}}\rightarrow 0\) as \(h\rightarrow 0\). As such, \(\Vert \varvec{u}-\varvec{u}_h\Vert _h\rightarrow 0\) as \(h\rightarrow 0\).

Finally, we show the strong convergence of \(\underline{\sigma }_h\). Let \(\varvec{v}^*\in \varvec{H}^1(\Omega )\) be such that \(\nabla \cdot \varvec{v}^*=\textrm{tr}(\underline{\sigma }_h) \) and \(\Vert \varvec{v}^*\Vert _{H^1(\Omega )}\le C \Vert \textrm{tr}(\underline{\sigma }_h)\Vert _{L^2(\Omega )}\) and set \(\varvec{v}_h=\varvec{\Pi _U}\varvec{v}^*\). Then

$$\begin{aligned} \Vert \textrm{tr}(\underline{\sigma }_h)\Vert _{L^2(\Omega )}^2=(\textrm{tr}(\underline{\sigma }_h), \nabla \cdot \varvec{v}^*)&=(\textrm{tr}(\underline{\sigma }_h), \nabla \cdot \varvec{v}_h)_{\mathcal {T}_h}\\&=d\Big ((\underline{\sigma }_h, \varepsilon (\varvec{v}_h))_{\mathcal {T}_h}-(\mathcal {A}\underline{\sigma }_h, \varepsilon (\varvec{v}_h))_{\mathcal {T}_h}\Big ). \end{aligned}$$

We let \(\widehat{\varvec{v}}_h=\varvec{P_M}(\varvec{v}^*)^t\), \((\varvec{v}_h,\widehat{\varvec{v}}_h)\) is bounded in \(\Vert (\cdot ,\cdot )\Vert _{1,h}\)-norm. There is \(\varvec{v}\in \varvec{H}^1(\Omega )\) such that, up to a subsequence, \(\varvec{v}_h\rightarrow \varvec{v}\) strongly in \(\varvec{L}^2(\Omega )\) and \(\underline{G}_h(\varvec{v}_h,\widehat{\varvec{v}}_h)\rightharpoonup \varepsilon (\varvec{v})\) weakly in \(\underline{L}^2(S,\Omega )\). Moreover, \(\nabla \cdot \varvec{v}=\textrm{tr}(\underline{\sigma })\) in distributional sense. It follows from (2.13) that

$$\begin{aligned} (\underline{\sigma }_h, \varepsilon (\varvec{v}_h))_{L^2(\mathcal {T}_h)}&=\langle (\underline{\sigma }_h\varvec{n})^t, \varvec{v}_h-\widehat{\varvec{v}}_h\rangle _{\partial \mathcal {T}_h}-\langle \tau (\varvec{P_M}\varvec{u}_h^t-\widehat{\varvec{u}}_h), \varvec{v}_h^t-\widehat{\varvec{v}}_h \rangle _{\partial \mathcal {T}_h}\\&\quad \;-N_h(\varvec{u}_h;(\varvec{u}_h,\widehat{\varvec{u}}_h),(\varvec{v}_h,\widehat{\varvec{v}}_h))+(\varvec{f}_N,\varvec{v}_h). \end{aligned}$$

The definition of \(\underline{G}_h(\varvec{v}_h,\widehat{\varvec{v}}_h)\) gives

$$\begin{aligned} (\mathcal {A}\underline{\sigma }_h, \varepsilon (\varvec{v}_h))_{\mathcal {T}_h}=(\mathcal {A}\underline{\sigma }_h, \underline{G}_h(\varvec{v}_h,\widehat{\varvec{v}}_h))_{\mathcal {T}_h}+(\mathcal {A}\underline{\sigma }_h, \underline{R}(\varvec{P_M}\varvec{v}_h^t-\widehat{\varvec{v}}_h))_{\mathcal {T}_h}. \end{aligned}$$

Therefore, it holds

$$\begin{aligned} \begin{aligned}&(\underline{\sigma }_h, \varepsilon (\varvec{v}_h))_{\mathcal {T}_h}-(\mathcal {A}\underline{\sigma }_h, \varepsilon (\varvec{v}_h))_{\mathcal {T}_h}\\&\quad =-(\mathcal {A}\underline{\sigma }_h, \underline{G}_h(\varvec{v}_h,\widehat{\varvec{v}}_h))-\langle \tau (\varvec{P_M}\varvec{u}_h^t-\widehat{\varvec{u}}_h), \varvec{v}_h^t-\widehat{\varvec{v}}_h \rangle _{\partial \mathcal {T}_h}\\&\qquad +(\varvec{f}_N,\varvec{v}_h)-N_h(\varvec{u}_h;(\varvec{u}_h,\widehat{\varvec{u}}_h),(\varvec{v}_h,\widehat{\varvec{v}}_h)), \end{aligned} \end{aligned}$$
(5.13)

where the first term on the right-hand side converges to \( -(\mathcal {A}\underline{\sigma }, \varepsilon (\varvec{v}))_{\mathcal {T}_h}\) owing to the strong convergence of \(\mathcal {A}\underline{\sigma }_h\) and weak convergence of \(\underline{G}_h(\varvec{v}_h,\widehat{\varvec{v}}_h)\). The third term on the right-hand side converges to \((\varvec{f}_N,\varvec{v})\). Moreover, the Cauchy–Schwarz inequality yields

$$\begin{aligned}&\langle \tau (\varvec{P_M}\varvec{u}_h^t-\widehat{\varvec{u}}_h), \varvec{v}_h^t-\widehat{\varvec{v}}_h \rangle _{\partial \mathcal {T}_h}\\&\quad \le \Vert \tau ^{\frac{1}{2}}(\varvec{P_M}\varvec{u}_h^t-\widehat{\varvec{u}}_h)\Vert _{L^2(\partial \mathcal {T}_h)}\Vert \tau ^{\frac{1}{2}}(\varvec{v}_h^t-\widehat{\varvec{v}}_h)\Vert _{L^2(\partial \mathcal {T}_h)}, \end{aligned}$$

which tends to zero as \(h\rightarrow 0\).

Furthermore, (5.5) yields

$$\begin{aligned} N_h(\varvec{u}_h;(\varvec{u}_h,\widehat{\varvec{u}}_h),(\varvec{v}_h,\widehat{\varvec{v}}_h))&=(\varvec{u}_h\cdot \mathcal {G}_h^{2k}(\varvec{u}_h,\widehat{\varvec{u}}_h), \varvec{v}_h)\\&\quad +\frac{1}{2}\left\langle \varvec{u}_h\cdot \varvec{n}+|\varvec{u}_h\cdot \textbf{n}|,(\varvec{u}_h^t-\widehat{\varvec{u}}_h)\cdot (\varvec{v}_h^t-\widehat{\varvec{v}}_h)\right\rangle _{\partial \mathcal {T}_h\backslash \partial \Omega _D}, \end{aligned}$$

where the first term converges to \((\varvec{u}\cdot \nabla \varvec{u} ,\varvec{v})\) that can be proved analogously to (5.7).

Fig. 1
figure 1

Convergence history for Example 6.1 with \(\mu =1\) and \(k=1,2,3,4\)

Fig. 2
figure 2

Convergence history for Example 6.1 with \(\mu =10^{-4}\) and \(k=1,2,3,4\)

Fig. 3
figure 3

Convergence history for Example 6.1 with \(\mu =10^{-8}\) and \(k=1,2,3,4\)

Similarily to (5.8), it holds

$$\begin{aligned}&\frac{1}{2}\left\langle \varvec{u}_h\cdot \varvec{n}+|\varvec{u}_h\cdot \textbf{n}|,(\varvec{u}_h^t-\widehat{\varvec{u}}_h)\cdot (\varvec{v}_h^t-\widehat{\varvec{v}}_h)\right\rangle _{\partial \mathcal {T}_h\backslash \partial \Omega _D}\\&\quad \le C h^{(4-d)/4} \Vert \varvec{u}_h\Vert _{L^4(\Omega )}\Vert (\varvec{u}_h,\widehat{\varvec{u}}_h)\Vert _{1,h}\Vert (\varvec{v}_h,\widehat{\varvec{v}}_h)\Vert _{1,h}. \end{aligned}$$

An appeal to Lemma 5.2 yields that \(\Vert \varvec{u}_h\Vert _{L^4(\Omega )}\) is bounded, which in conjunction with the boundedness of \(\Vert (\varvec{u}_h,\widehat{\varvec{u}}_h)\Vert _{1,h}\) and \(\Vert (\varvec{v}_h,\widehat{\varvec{v}}_h)\Vert _{1,h}\) implies that the right-hand side tends to zero. Thus, the last term on the right-hand side of (5.13) converges to \((\varvec{u}\cdot \nabla \varvec{u}, \varvec{v})\).

As a result, we have

$$\begin{aligned} \lim \Vert \textrm{tr}(\underline{\sigma }_h)\Vert _{L^2(\Omega )}^2=\Vert \textrm{tr}(\underline{\sigma })\Vert _{L^2(\Omega )}^2. \end{aligned}$$

Then, the triangle inequality gives

$$\begin{aligned} \Vert \underline{\sigma }-\underline{\sigma }_h\Vert _{L^2(\Omega )}\le \Vert \mathcal {A}(\underline{\sigma }-\underline{\sigma }_h)\Vert _{L^2(\Omega )}+\Vert \textrm{tr}(\underline{\sigma }-\underline{\sigma }_h)\Vert _{L^2(\Omega )}\rightarrow 0 \quad \text{ as }\;h\rightarrow 0, \end{aligned}$$

which yields the strong convergence of \(\underline{\sigma }_h\). \(\square \)

Remark 5.1

Proceeding in a similar fashion to that of the Stokes equations and mimicking the convergence error estimates presented in [10] for the Navier–Stokes equations, we are also able to show the convergence error estimates by assuming that the weak solutions are smooth; indeed, the optimal convergence error estimates can be achieved for all the variables. The details are omitted here for simplicity.

6 Numerical experiments

In this section, several two-dimensional numerical experiments will be carried out to test the capabilities of the proposed method as well as the proposed error estimator for the Stokes equations. Two examples with smooth solutions will be employed to test the convergence rates of the proposed method. In particular, the robustness of the method with respect to the values of the viscosity and the pressure robustness of the method will be confirmed.

6.1 Smooth solution example

In this example, we consider \(\Omega =(0,1)^2\) and the exact solution is defined by

$$\begin{aligned}\varvec{u}&= {\left\{ \begin{array}{ll} x^2\pi \sin (2y\pi )(x - 1)^2+1,\\ -2x\sin (y\pi )^2(2x - 1)(x - 1) +1 \end{array}\right. }\quad \\ {}&\quad \text{ and }\quad p=(\cos (1)-1)\sin (1) + \cos (y)\sin (x). \end{aligned}$$

The convergence history against the number of degrees of freedom with \(\mu =1,10^{-4}\) and \(\mu =10^{-8}\) for the polynomial order \(k=1,2,3,4\) is displayed in Figure 1, 2 and 3, respectively. We can observe that the optimal convergence rates matching the theoretical results can be obtained. In addition, the convergence rates for all the variables remain optimal regardless of the values of \(\mu \), which verifies the robustness of the scheme with respect to \(\mu \). Moreover, we also observe that the accuracy for the velocity error is slightly influenced by the value of \(\mu \).

6.2 No-flow example

In this example, we use the unit square domain, i.e., \(\Omega =(0,1)\) and the exact solution is defined by

$$\begin{aligned} \varvec{u}= {\left\{ \begin{array}{ll} 0\\ 0 \end{array}\right. },\quad p=-\frac{\text {Ra}}{2}y^2+\text {Ra} y-\frac{\text {Ra}}{3}, \end{aligned}$$

where \(\text {Ra}=10^3\). The convergence history against the number of degree of freedom with \(k=1\) is reported on the left panel of Fig. 4. We can observe that the \(L^2\)-errors of stress and velocity approach zero, in addition, \(\Vert \varepsilon (\varvec{u}-\varvec{u}_h)\Vert _{L^2(\mathcal {T}_h)}\) also approaches zero. The convergence rates for \(\Vert \textrm{tr}(\underline{\sigma }-\underline{\sigma })\Vert _{L^2(\Omega )}\) is optimal as reflected by the theories. In addition, the solution profile of \(\textrm{tr}(\underline{\sigma }_h)\) is also correct. This example once again highlights that the proposed scheme is pressure-robust.

Fig. 4
figure 4

Convergence history for Example 6.2 with \(k=1\)