Solving PDEs by variational physics-informed neural networks: an a posteriori error analysis

Berrone, Stefano; Canuto, Claudio; Pintore, Moreno

doi:10.1007/s11565-022-00441-6

Solving PDEs by variational physics-informed neural networks: an a posteriori error analysis

Open access
Published: 19 September 2022

Volume 68, pages 575–595, (2022)
Cite this article

Download PDF

You have full access to this open access article

ANNALI DELL'UNIVERSITA' DI FERRARA Aims and scope Submit manuscript

Solving PDEs by variational physics-informed neural networks: an a posteriori error analysis

Download PDF

Stefano Berrone¹,
Claudio Canuto¹ &
Moreno Pintore¹

2711 Accesses
5 Citations
1 Altmetric
Explore all metrics

Abstract

We consider the discretization of elliptic boundary-value problems by variational physics-informed neural networks (VPINNs), in which test functions are continuous, piecewise linear functions on a triangulation of the domain. We define an a posteriori error estimator, made of a residual-type term, a loss-function term, and data oscillation terms. We prove that the estimator is both reliable and efficient in controlling the energy norm of the error between the exact and VPINN solutions. Numerical results are in excellent agreement with the theoretical predictions.

An improved physics-informed neural network with adaptive weighting and mixed differentiation for solving the incompressible Navier–Stokes equations

Article 19 June 2024

A Review of Physics Informed Neural Networks for Multiscale Analysis and Inverse Problems

Article 13 February 2024

Finite basis physics-informed neural networks (FBPINNs): a scalable domain decomposition approach for solving differential equations

Article Open access 31 July 2023

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

The possibility of using deep-learning tools for solving complex physical models has attracted the attention of many scientists over the last few years. We have in mind in this paper models that are mathematically described by partial differential equations, supplemented by suitable boundary and initial conditions. In the most general setting, if no information on the model is available except the knowledge of some of its solutions, the model may be completely surrogated by one or more neural network, trained by data (i.e., by the known solutions). However, in most situations of interest, the mathematical model is known (e.g., the Navier-Stokes equations describing an incompressible flow), and such information may be suitably exploited in training the network(s): one gets the so-called Physics Informed Neural Networks (PINNs). This approach was first proposed in [1], and it inspired further works such as e.g. [2] or [3], until the recent paper [4] which presents a very general framework for the solution of operator equations by deep neural networks. PINNs are trained by using the strong form of the differential equations, which are enforced at a set of points in the domain by suitably defining the loss function. In this sense, PINNs can be viewed as particular instances of least-square/collocations methods.

Based on the weak formulation of the differential model, the so-called Variational Physics-Informed Neural Networks (VPINNs), proposed in [5], enforce the equations by means of suitably chosen test functions, not necessarily represented by neural networks [6]; they are instances of least-square/Petrov–Galerkin methods. While the construction of the loss function is generally less expensive for PINNs than for VPINNs, the latter allow for the treatment of models with less regular solutions, as well as an easier enforcement of boundary conditions. In addition, the error analysis for VPINNs takes advantage of the available results for the discretization of variational problems, in fulfilling the assumptions of Lax-Richmyer’s theorem ‘stability plus consistency imply convergence’. Actually, consistency results follow rather easily from the recently established approximation properties of neural networks in Sobolev spaces (see, e.g., [7,8,9,10,11,12]), whereas the derivation of stability estimates for the neural network solution appears to be a less trivial task: indeed, a neural network is identified by its weights, which are usually much more than the conditions enforced in its training. In other words, the training of a neural network is functionally an ill-posed problem.

To this respect, we considered in [13] a Petrov–Galerkin framework in which trial functions are defined by means of neural networks, whereas test functions are made of continuous, piecewise linear functions on a triangulation of the domain. Relying on an inf-sup condition between spaces of piecewise polynomial functions, we derived an a priori error estimate in the energy norm between the exact solution of an elliptic boundary-value problem and a high-order interpolant of a deep neural network, which minimizes the loss function. Numerical results indicate that the error follows a similar behavior when the interpolation operator is turned off.

The purpose of the present paper is to perform an a posteriori error analysis for VPINNs, i.e., to get estimates on the error which only depend on the computed VPINN solution, rather than the unknown exact solution. This is important to get a practical and quantitative information on the quality of the approximation. After setting the model elliptic boundary-value problem in Sect. 2, and the corresponding VPINN discretization in Sect. 2.1, we define in Sect. 3 a computable residual-type error estimator, and prove that it is both reliable and efficient in controlling the energy error between the exact solution and the VPINN solution (the so-called generalization error in the terminology of Learning Theory, see [14]). Reliability means that the global error is upper bounded by a constant times the estimator, efficiency means that the estimator cannot over-estimate the energy error, since the latter is lower bounded by a constant times the former up to data oscillation terms. The proposed estimator is obtained by summing up several terms: one is the classical residual-type estimator in finite elements, measuring the bulk error inside each element of the triangulation as well as the inter-element gradient jumps; another term accounts for the magnitude of the loss function after minimization is performed (the so-called optimization error); the remaining terms measure data oscillations, i.e., the errors committed by locally projecting the equation’s coefficients and right-hand side upon suitable polynomial spaces. The estimator can be written as a sum of elemental contributions, thereby allowing its use within an adaptive discretization strategy which refines the elements carrying the largest contributions to the estimator.

2 The model boundary-value problem

Let $\Omega \subset {\mathbb {R}}^n$ be a bounded polygonal/polyhedral domain with Lipschitz boundary $\Gamma =\partial \Omega $.

Let us consider the model elliptic boundary-value problem

$$\begin{aligned} {\left\{ \begin{array}{ll} Lu:=-\nabla \cdot (\mu \nabla u) + \varvec{\beta }\cdot \nabla u + \sigma u =f &{} \text {in } \Omega \,, \\ u=0 &{} \text {on } \Gamma \,, \end{array}\right. } \end{aligned}$$

(1)

where $\mu , \sigma \in \mathrm{L}^\infty (\Omega )$, $ \varvec{\beta } \in (\mathrm{W}^{1,\infty }(\Omega ))^n$ satisfy $\mu \ge \mu _0$, $\sigma - \frac{1}{2} \nabla \cdot \varvec{\beta } \ge 0$ in $\Omega $ for some constant $\mu _0>0$, whereas $f \in L^2(\Omega )$.

Setting $V=\mathrm{H}^1_{0}(\Omega )$, define the bilinear and linear forms

$$\begin{aligned}&a:V\times V \rightarrow {\mathbb {R}}\,, \qquad a(w,v)=\int _\Omega \mu \nabla w \cdot \nabla v + \varvec{\beta }\cdot \nabla w \, v + \sigma w \, v\,, \end{aligned}$$

(2)

$$\begin{aligned}&F:V\rightarrow {\mathbb {R}}\,, \qquad F(v)=\int _\Omega f \, v \,; \end{aligned}$$

(3)

denote by $\alpha \ge \mu _0$ the coercivity constant of the form a, and by $\Vert a \Vert $, $\Vert F \Vert $ the continuity constants of the forms a and F. Problem (1) is formulated variationally as follows: Find $u \in V $ such that

$$\begin{aligned} a(u,v)=F(v) \qquad \forall v \in V\,. \end{aligned}$$

(4)

Remark 1

(Other boundary conditions) We just consider homogeneous Dirichlet conditions to keep technicalities at a minimum. However, the forthcoming formulation of the discretized problem and the a posteriori error analysis can be extended to cover the case of mixed Dirichlet-Neumann boundary conditions, namely $u=g$ on $\Gamma _D$, $\mu \partial _n u =\psi $ on $\Gamma _N$, with $\Gamma _D \cup \Gamma _N = \Gamma $. We refer to [13, 15] for the general case.

2.1 The VPINN discretization

We aim at approximating the solution of Problem (1) by a generalized Petrov–Galerkin strategy.

To define the subset of V of trial functions, let us choose a fully-connected feed-forward neural network structure ${{{\mathcal {N}}}\!N}$, with n input variables and 1 output variable, identified by the number of layers L, the layer widths $N_\ell $, $\ell =1, \dots , L$, and the activation function $\rho $. Thus, each choice of the weights ${{\mathbf {w}}} \in {\mathbb {R}}^N$ defines a mapping $w^{{{\mathcal {N}}}\!N}: {\varvec{x}} \mapsto w({\varvec{x}},{{\mathbf {w}}})$, which we think as restricted to the closed domain ${\bar{\Omega }}$; let us denote by $W^{{{\mathcal {N}}}\!N}$ the manifold containing all functions that can be generated by this neural network structure. Therefore, each function $w\in W^{{{\mathcal {N}}}\!N}$ can be explicitely computed as:

$$\begin{aligned} \begin{aligned}&x_0={\varvec{x}}, \\&x_\ell = \rho (A_\ell x_{\ell -1} + b_\ell ), \quad \ell = 1,\ldots ,L-1, \\&w({\varvec{x}}) = A_{L} x_{L-1} + b_L, \end{aligned} \end{aligned}$$

(5)

where the matrices and vectors $A_\ell \in {\mathbb {R}}^{N_\ell \times N_{\ell -1}}$ and $b_\ell \in {\mathbb {R}}^{N_\ell }$, $\ell =1,...,L$ are a suitable rearrangement of the weights in ${\mathbf {w}}$, with $N_0=n$ and $N_L=1$. We enforce the homogeneous Dirichlet boundary conditions by multiplying each w by a fixed smooth function $\Phi \in V$ (we refer to [15] for a general strategy to construct this function); we assume that $v^{{{\mathcal {N}}}\!N}= \Phi w^{{{\mathcal {N}}}\!N}$ belongs to V for any $w^{{{\mathcal {N}}}\!N}\in W^{{{\mathcal {N}}}\!N}$. In conclusion, our manifold of trial functions will be

$$\begin{aligned} V^{{{\mathcal {N}}}\!N}= \{ v^{{{\mathcal {N}}}\!N}\in V : v^{{{\mathcal {N}}}\!N}=\Phi w^{{{\mathcal {N}}}\!N}\text { for some }w^{{{\mathcal {N}}}\!N}\in W^{{{\mathcal {N}}}\!N}\}\,. \end{aligned}$$

To define the subspace of V of test functions, let us introduce a conforming, shape-regular triangulation ${{{\mathcal {T}}}}_h= \{ E_n : 1 \le n \le N_h \}$ of ${\bar{\Omega }}$ with meshsize $h>0$; the generic element of the triangulation will be denoted by E. Let $V_h \subset V$ be the linear subspace formed by the functions which are piecewise linear polynomials over the triangulation ${{{\mathcal {T}}}}_h$. Furthermore, let us introduce computable approximations of the forms a and F by numerical quadratures. Precisely, for any $E \in {{{\mathcal {T}}}}_h$, let $\{(\xi ^E_\iota ,\omega ^E_\iota ) : \iota \in I^E\}$ be the nodes and weights of a quadrature formula of precision $q \ge 2$ on E. Then, assuming that all data $\mu $, $\varvec{\beta }$, $\sigma $, f are continuous in each element of the triangulation, we define the approximate forms

$$\begin{aligned} a_h(w,v)= & {} \sum _{E \in {{{\mathcal {T}}}}_h} \sum _{\iota \in I^E} [\mu \nabla w \cdot \nabla v + \varvec{\beta }\cdot \nabla w \, v + \sigma w v](\xi ^E_\iota ) \,\omega ^E_\iota \,, \end{aligned}$$

(6)

$$\begin{aligned} F_h(v)= & {} \sum _{E \in {{{\mathcal {T}}}}_h} \sum _{\iota \in I^E} [ f v](\xi ^E_\iota ) \,\omega ^E_\iota \,. \end{aligned}$$

(7)

With these ingredients at hand, we would like to approximate the solution of Problem (4) by some $u^{{{\mathcal {N}}}\!N} \in V^{{{\mathcal {N}}}\!N}$ satisfying

$$\begin{aligned} a_h(u^{{{\mathcal {N}}}\!N},v_h)=F_h(v_h) \qquad \forall v_h \in V_h\,. \end{aligned}$$

(8)

In order to handle this problem by the neural network, let us introduce a basis in $V_h$, say $V_h = \text {span}\{\varphi _i : i\in I_h\}$, and for any $w \in V $ let us define the residuals

$$\begin{aligned} r_{h,i}(w)=F_h(\varphi _i)-a_h(w,\varphi _i)\,, \qquad i \in I_h\,, \end{aligned}$$

(9)

as well as the loss function

$$\begin{aligned} R_h^2(w) = \sum _{i \in I_h} r_{h,i}^2(w) \,. \end{aligned}$$

(10)

Then, we search for a global minimum of the loss function in $V^{{{\mathcal {N}}}\!N}$, i.e., we consider the following minimization problem: Find $u^{{{\mathcal {N}}}\!N} \in V^{{{\mathcal {N}}}\!N}$ such that

$$\begin{aligned} u^{{{\mathcal {N}}}\!N} \in \displaystyle {\text {arg}\!\!\!\!\min _{w \in V^{{{\mathcal {N}}}\!N}}}\, R_h^2(w) \,. \end{aligned}$$

(11)

Existence of a minimum follows immediately from the fact that $R_h^2$ is a continuous, quadratic function of its argument. Uniqueness may not occur. Indeed, any solution $u^{{{\mathcal {N}}}\!N}$ of (8) annihilates the loss function, hence it is a solution of (11); such a solution may not be unique, since the set of equations (8) may be underdetermined (in particular, for $f=0$ one may obtain a non-zero $u^{{{\mathcal {N}}}\!N}$, see [13, Sect. 6.3]). On the other hand, system (8) may be overdetermined, and admit no solution; in this case, the loss function will have strictly positive minima.

Remark 2

(Discretization with interpolation) In order to reduce and control the randomic effects related to the use of a network depending upon a large number of weights, in [13] we proposed to locally project the neural network upon a space of polynomials, before computing the loss function.

To be precise, we have considered a conforming, shape-regular partition ${{{\mathcal {T}}}}_H=\{G_m : 1 \le m \le M_h\}$ of ${\bar{\Omega }}$, which is equal to or coarser than ${{{\mathcal {T}}}}_h$ (i.e., each element $E \in {{{\mathcal {T}}}}_h$ is contained in an element $G \in {{{\mathcal {T}}}}_H$) but compatible with ${{{\mathcal {T}}}}_h$ (i.e., its meshsize $H>0$ satisfies $H\lesssim h$). Let $V_H \subset V$ be the linear subspace formed by the functions which are piecewise polynomials of degree $k_\text {int}=q+1$ over the triangulation ${{{\mathcal {T}}}}_H$, and let ${{{\mathcal {I}}}}_H : \mathrm{C}^0({\bar{\Omega }}) \rightarrow V_H$ be the associated element-wise Lagrange interpolation operator.

Given a neural network $w \in V^{{{\mathcal {N}}}\!N}$, let us denote by $w_H= {{{\mathcal {I}}}}_H w^{{{\mathcal {N}}}\!N}\in V_H$ its piecewise polynomial interpolant. Then, the definition (9) of local residuals is modified as

$$\begin{aligned} {\tilde{r}}_{h,i}(w)=F_h(\varphi _i)-a_h(w_H,\varphi _i)\,, \qquad i \in I_h\,; \end{aligned}$$

(12)

consequently, the loss function takes the form

$$\begin{aligned} {\tilde{R}}_h^2(w) = \sum _{i \in I_h} {\tilde{r}}_{h,i}^2(w) \,, \end{aligned}$$

(13)

and we define a new approximation of the solution of Problem (4) by setting

$$\begin{aligned} {\tilde{u}}^{{{\mathcal {N}}}\!N}_H = {{{\mathcal {I}}}}_H {\tilde{u}}^{{{\mathcal {N}}}\!N}\in V_H\,, \qquad \text {where} \quad {\tilde{u}}^{{{\mathcal {N}}}\!N} \in \displaystyle {\text {arg}\!\!\!\!\min _{w \in U^{{{\mathcal {N}}}\!N}}}\, {\tilde{R}}_h^2(w) \,. \end{aligned}$$

(14)

In [13] we derived an a priori error estimate for the error $\Vert u - {\tilde{u}}^{{{\mathcal {N}}}\!N}_H \Vert _V$, and we documented the error decay as $h \rightarrow 0$, which turns out to have a more regular behavior that the error $\Vert u - {u}^{{{\mathcal {N}}}\!N}\Vert _V$, although the latter is usually smaller.

The subsequent a posteriori error analysis could be extended to give a control on the error produced by ${\tilde{u}}^{{{\mathcal {N}}}\!N}_H $ as well. For the sake of simplicity, we do not pursue such a task here.

3 The a posteriori error estimator

In order to build an error estimator, let us first choose, for any $E \in {{{\mathcal {T}}}}_h$ and any $k \ge 0$, a projection operator $\Pi _{E,k} : L^2(E) \rightarrow {\mathbb {P}}_k(E)$ satisfying

$$\begin{aligned} \int _E \Pi _{E,k} \varphi = \int _E \varphi \qquad \forall \varphi \in L^2(E) \,. \end{aligned}$$

(15)

This allows us to introduce approximate bilinear and linear forms

$$\begin{aligned} a_\pi (w,v)= & {} \sum _{E \in {{{\mathcal {T}}}}_h} \int _E \Pi _{E,q}\left( \mu \nabla w \right) \cdot \nabla v + \Pi _{E,q-1}\left( \varvec{\beta }\cdot \nabla w + \sigma w\right) v\,, \end{aligned}$$

(16)

$$\begin{aligned} F_\pi (v)= & {} \sum _{E \in {{{\mathcal {T}}}}_h} \int _E \left( \Pi _{E,q-1} f\right) v \,, \end{aligned}$$

(17)

which are useful in the forthcoming derivation. Indeed, the coercivity of the form a allows us to bound the V-norm of the error as follows:

$$\begin{aligned} \vert u - u^{{{\mathcal {N}}}\!N}\vert _{1,\Omega } \le \frac{1}{\alpha }\sup _{v \in V} \frac{a(u - u^{{{\mathcal {N}}}\!N},v)}{\vert v \vert _{1,\Omega }} \,. \end{aligned}$$

(18)

We split the numerator as

$$\begin{aligned} a(u - u^{{{\mathcal {N}}}\!N},v)&= F(v) -a(u^{{{\mathcal {N}}}\!N},v) = \underbrace{F(v)-F_\pi (v)}_{(\text {I})} \ + \ \underbrace{F_\pi (v)-a_\pi (u^{{{\mathcal {N}}}\!N},v)}_{(\text {III})}\nonumber \\&\quad + \ \underbrace{a_\pi (u^{{{\mathcal {N}}}\!N},v) - a(u^{{{\mathcal {N}}}\!N},v)}_{(\text {II})} \end{aligned}$$

(19)

and we proceed to bound each term on the right-hand side.

The terms $(\mathrm{I})$ and $(\mathrm{II})$ account for the element-wise projection error upon polynomial spaces; they are estimated in the next two Lemmas.

Lemma 1

The quantity $(\mathrm{I})$ defined in (19) satisfies

$$\begin{aligned} \vert ( \mathrm{I} ) \vert \lesssim \Big (\sum _{E \in {{{\mathcal {T}}}}_h} \eta _{\mathrm{rhs},1}^2(E) \Big )^{1/2} \vert v \vert _{1,\Omega }\,, \end{aligned}$$

(20)

with

$$\begin{aligned} \eta _{\mathrm{rhs},1}(E) = h_E \Vert f - \Pi _{E,q-1} f \Vert _{0,E} \,. \end{aligned}$$

(21)

Proof

Setting $m_E(v)=\frac{1}{\vert E \vert } \int _E v$ and using (15), we get

$$\begin{aligned} ( \mathrm{I} ) = \sum _{E \in {{{\mathcal {T}}}}_h} \int _E \left( f - \Pi _{E,q-1} f \right) (v-m_E(v) ) \,, \end{aligned}$$

and we conclude using the bound $\Vert v - m_E(v) \Vert _{0,E} \lesssim h_E \vert v \vert _{1,E}$. $\square $

Lemma 2

The quantity $(\mathrm{II})$ defined in (19) satisfies

$$\begin{aligned} \vert ( \mathrm{II} ) \vert \lesssim \Big ( \sum _{E \in {{{\mathcal {T}}}}_h} \big ( \eta _{\mathrm{coef},1}^2(E) + \eta _{\mathrm{coef},2}^2(E) + \eta _{\mathrm{coef},3}^2(E) \big ) \Big )^{1/2} \vert v \vert _{1,\Omega }\,, \end{aligned}$$

(22)

with

$$\begin{aligned} \begin{aligned} \eta _{\mathrm{coef},1}(E)&= \Vert \mu \nabla u^{{{\mathcal {N}}}\!N}- \Pi _{E,q} (\mu \nabla u^{{{\mathcal {N}}}\!N}) \Vert _{0,E} \,, \\ \eta _{\mathrm{coef},2}(E)&= h_E \Vert \varvec{\beta }\cdot \nabla u^{{{\mathcal {N}}}\!N}- \Pi _{E,q-1}( \varvec{\beta }\cdot \nabla u^{{{\mathcal {N}}}\!N}) \Vert _{0,E} \,, \\ \eta _{\mathrm{coef},3}(E)&= h_E \Vert \sigma u^{{{\mathcal {N}}}\!N}- \Pi _{E,q-1}( \sigma u^{{{\mathcal {N}}}\!N}) \Vert _{0,E} \,. \end{aligned} \end{aligned}$$

(23)

Proof

It holds

$$\begin{aligned} \begin{aligned} (\mathrm{II})&= \sum _{E \in {{{\mathcal {T}}}}_h} \int _E \Big ( \mu \nabla u^{{{\mathcal {N}}}\!N}- \Pi _{E,q}(\mu \nabla u^{{{\mathcal {N}}}\!N}) \Big ) \cdot \nabla v \\&\quad + \sum _{E \in {{{\mathcal {T}}}}_h} \int _E \Big ( \varvec{\beta }\cdot \nabla u^{{{\mathcal {N}}}\!N}- \Pi _{E,q-1}( \varvec{\beta }\cdot \nabla u^{{{\mathcal {N}}}\!N}) \Big ) (v - m_E(v)) \\&\quad + \sum _{E \in {{{\mathcal {T}}}}_h} \int _E \Big ( \sigma u^{{{\mathcal {N}}}\!N}- \Pi _{E,q-1}( \sigma u^{{{\mathcal {N}}}\!N}) \Big ) (v - m_E(v)) \,, \end{aligned} \end{aligned}$$

where we have used again (15). We conclude as in the proof of Lemma 1. $\square $

Let us now focus on the quantity $(\mathrm{III})$, which can be written as

$$\begin{aligned} (\mathrm{III}) = \underbrace{F_\pi (v-v_h) - a_\pi (u^{{{\mathcal {N}}}\!N},v-v_h)}_{(\text {IV})} + \underbrace{F_\pi (v_h) - a_\pi (u^{{{\mathcal {N}}}\!N},v_h)}_{(\text {V})} \,, \qquad \forall v_h \in V_h\,; \end{aligned}$$

(24)

in turn, the quantity $(\mathrm{V})$ can be written as

$$\begin{aligned} (\mathrm{V}) = \underbrace{F_\pi (v_h) -F_h(v_h)}_{(\text {VII})} + \underbrace{F_h(v_h)-a_h(u^{{{\mathcal {N}}}\!N},v_h)}_{(\text {VI})} + \underbrace{a_h(u^{{{\mathcal {N}}}\!N},v_h)-a_\pi (u^{{{\mathcal {N}}}\!N},v_h)}_{(\text {VIII})} \,.\nonumber \\ \end{aligned}$$

(25)

The bound of $(\mathrm{IV})$ is standard in finite-element a posteriori error analysis: it involves the local bulk residuals

$$\begin{aligned} \mathrm{bulk}_E(u^{{{\mathcal {N}}}\!N}) = \Pi _{E,q-1}f +\nabla \cdot \Pi _{E,q} (\mu \nabla u^{{{\mathcal {N}}}\!N}) - \Pi _{E,q-1}( \varvec{\beta }\cdot \nabla u^{{{\mathcal {N}}}\!N}+ \sigma u^{{{\mathcal {N}}}\!N}) \end{aligned}$$

(26)

and the interelement jumps at each edge e shared by two elements, say $E_1$ and $E_2$ with opposite normal unit vectors ${\varvec{n}}_1$ and ${\varvec{n}}_2$, namely

$$\begin{aligned} \mathrm{jump}_e(u^{{{\mathcal {N}}}\!N}) = \Pi _{E_1,q}(\mu \nabla u^{{{\mathcal {N}}}\!N})\cdot {\varvec{n}}_1 + \Pi _{E_2,q}(\mu \nabla u^{{{\mathcal {N}}}\!N})\cdot {\varvec{n}}_2 \,; \end{aligned}$$

(27)

in addition, one defines $\mathrm{jump}(u^{{{\mathcal {N}}}\!N}, e) =0$ if $e \subset \partial \Omega $.

To derive the bound, the test function $v_h$ in (24) is chosen as $v_h=I_h^C v$, the Clément interpolant of v on ${{{{\mathcal {T}}}}_h}$ [16], which satisfies

$$\begin{aligned} \Vert v-I_h^C v \Vert _{k,E} \lesssim {h_E^{1-k}} \vert v \vert _{1, D_E}, \qquad k=0,1 \,, \end{aligned}$$

(28)

where $D_E = \cup \{E' \in {{{{\mathcal {T}}}}_h}: E \cap E' \not = \emptyset \}$.

Lemma 3

The quantity $(\mathrm{IV})$ defined in (24) satisfies

$$\begin{aligned} \vert ( \mathrm{IV} ) \vert \lesssim \Big (\sum _{E \in {{{\mathcal {T}}}}_h} \eta _{\mathrm{res}}^2(E) \Big )^{1/2} \vert v \vert _{1,\Omega }\,, \end{aligned}$$

(29)

where

$$\begin{aligned} \eta _{\mathrm{res}}(E) = h_E \Vert \, \mathrm{bulk}_E(u^{{{\mathcal {N}}}\!N}) \, \Vert _{0,E} + h_E^{1/2} \sum _{e \subset \partial E} \Vert \,\mathrm{jump}_e(u^{{{\mathcal {N}}}\!N}) \, \Vert _{0,e} \,, \end{aligned}$$

(30)

with $\mathrm{bulk}_E(u^{{{\mathcal {N}}}\!N})$ defined in (26) and $\mathrm{jump}_e(u^{{{\mathcal {N}}}\!N})$ defined in (27).

Proof

We refer e.g. to [17] for more details. $\square $

Before considering the quantity $(\mathrm{VI})$, let us state a useful result of equivalence of norms.

Lemma 4

For any $v_h = \sum _{i \in I_h} v_i \varphi _i \in V_h$, let ${\varvec{v}} = (v_i)_{i \in I_h}$ be the vector of its coefficients. There exist constants $0< c_h \le C_h$, possibly depending on h such that

$$\begin{aligned} c_h \vert v_h \vert _{1, \Omega } \le \Vert {\varvec{v}} \Vert _2 \le C_h \vert v_h \vert _{1, \Omega } \qquad \forall v_h \in V_h \,, \end{aligned}$$

(31)

where $\Vert {\varvec{v}} \Vert _2 = \left( \sum _{i \in I_h} v_i^2 \right) ^{1/2}$.

Proof

The result expresses the equivalence of norms in finite dimensional spaces. If the triangulation ${{{\mathcal {T}}}}_h$ is quasi uniform, then one can prove by a standard reference-element argument that $c_h \simeq h^{1-n/2}$ whereas $C_h \simeq h^{-n/2}$. $\square $

We are now able to bound the quantity $(\mathrm{VI})$ in terms of the loss function introduced in (10), as follows.

Lemma 5

The quantity $(\mathrm{VI})$ defined in (25) satisfies

$$\begin{aligned} \vert ( \mathrm{VI} ) \vert \lesssim \eta _{\mathrm{loss}} \vert v \vert _{1,\Omega }\,, \end{aligned}$$

(32)

where

$$\begin{aligned} \eta _{\mathrm{loss}} = C_h R_h(u^{{{\mathcal {N}}}\!N}) \end{aligned}$$

(33)

and the constant $C_h$ is defined in (31).

Proof

Writing $v_h = \sum _{i \in I_h} v_i \varphi _i$, it holds

$$\begin{aligned} (\mathrm{VI}) = \sum _{i \in I_h} r_{h,i}(u^{{{\mathcal {N}}}\!N}) v_i \,, \end{aligned}$$

whence

$$\begin{aligned} \vert (\mathrm{VI}) \vert \lesssim R_h(u^{{{\mathcal {N}}}\!N}) \Vert {\varvec{v}} \Vert _2 \,, \end{aligned}$$

We conclude by using (31) and observing that

$$\begin{aligned} \vert v_h \vert _{1, \Omega } \lesssim \vert v \vert _{1, \Omega } \,, \end{aligned}$$

(34)

since we have chosen $v_h=I_h^C v$ and (28) holds. $\square $

We are left with the problem of bounding the terms $(\mathrm{VII})$ and $(\mathrm{VIII})$ in (25). They are similar to the terms $(\mathrm{I})$ and $(\mathrm{II})$, respectively, but reflect the presence of the quadrature formula introduced in (6) and (7). In the forthcoming analysis, it will be useful to introduce the following notation for the quadrature-based discrete (semi-)norm on $C^0 (E)$:

$$\begin{aligned} \Vert \varphi \Vert _{0, E, \omega }= \Big (\sum _{\iota \in I^E} \varphi ^2(\xi ^E_\iota ) \,\omega ^E_\iota \Big )^{1/2} \,. \end{aligned}$$

(35)

Lemma 6

The quantity $(\mathrm{VII})$ defined in (25) satisfies

$$\begin{aligned} \vert ( \mathrm{VII} ) \vert \lesssim \Big (\sum _{E \in {{{\mathcal {T}}}}_h} \eta _{\mathrm{rhs},2}^2(E) \Big )^{1/2} \vert v \vert _{1,\Omega }\,, \end{aligned}$$

(36)

with

$$\begin{aligned} \eta _{\mathrm{rhs},2}(E) = h_E \Vert f - \Pi _{E,q-1} f \Vert _{0,E,\omega } + \Vert f - \Pi _{E,q} f \Vert _{0,E,\omega } \,. \end{aligned}$$

(37)

Proof

Recalling that the adopted quadrature rule has precision q and test functions $v_h$ are piecewise linear polynomials, it holds

$$\begin{aligned} \begin{aligned} (\mathrm{VII})&= \sum _{E \in {{{\mathcal {T}}}}_h} \Big ( \int _E (\Pi _{E,q-1} f) v_h - \sum _{\iota \in I^E} f(\xi ^E_\iota ) v_h(\xi ^E_\iota ) \,\omega ^E_\iota \Big ) \\&= \sum _{E \in {{{\mathcal {T}}}}_h} \Big ( \sum _{\iota \in I^E} (\Pi _{E,q-1} f - f) (\xi ^E_\iota ) v_h(\xi ^E_\iota ) \,\omega ^E_\iota \Big ) \\&= \underbrace{\sum _{E \in {{{\mathcal {T}}}}_h} \Big ( \sum _{\iota \in I^E} (\Pi _{E,q-1} f - f) (\xi ^E_\iota ) (v_h - m_E(v_h))(\xi ^E_\iota ) \,\omega ^E_\iota \Big )}_{(\text {VIIa})} \\&\qquad + \underbrace{\sum _{E \in {{{\mathcal {T}}}}_h} \Big ( \sum _{\iota \in I^E} (\Pi _{E,q-1} f - f) (\xi ^E_\iota ) \,\omega ^E_\iota m_E(v_h) \Big )}_{(\text {VIIb})} \,. \end{aligned} \end{aligned}$$

(38)

On the one hand, recalling the assumption $q \ge 2$ and inequality (34) one has

$$\begin{aligned} \begin{aligned} \vert (\mathrm{VIIa}) \vert&\le \sum _{E \in {{{\mathcal {T}}}}_h} \Vert f-\Pi _{E,q-1} f \Vert _{0, E, \omega } \Vert v_h - m_E(v_h) \Vert _{0, E, \omega } \\&= \sum _{E \in {{{\mathcal {T}}}}_h} \Vert f-\Pi _{E,q-1} f \Vert _{0, E, \omega } \Vert v_h - m_E(v_h) \Vert _{0, E} \\&\lesssim \sum _{E \in {{{\mathcal {T}}}}_h} h_E \Vert f-\Pi _{E,q-1} f \Vert _{0, E, \omega } \vert v_h \vert _{1,E} \\&\lesssim \Big ( \sum _{E \in {{{\mathcal {T}}}}_h} h_E^2 \Vert f-\Pi _{E,q-1} f \Vert _{0, E, \omega }^2 \Big )^{1/2} \vert v \vert _{1,\Omega } \,. \end{aligned} \end{aligned}$$

(39)

On the other hand, we first observe that, by the exactness of the quadrature rule and (15), we get

$$\begin{aligned} \sum _{\iota \in I^E} (\Pi _{E,q-1} f )(\xi ^E_\iota ) \,\omega ^E_\iota = \int _E \Pi _{E,q-1} f = \int _E f = \int _E \Pi _{E,q} f = \sum _{\iota \in I^E} (\Pi _{E,q} f )(\xi ^E_\iota ) \,\omega ^E_\iota . \end{aligned}$$

Hence,

$$\begin{aligned} \begin{aligned} \vert (\mathrm{VIIb}) \vert&\le \sum _{E \in {{{\mathcal {T}}}}_h} \Vert f-\Pi _{E,q} f \Vert _{0, E, \omega } \Vert m_E(v_h) \Vert _{0, E} \\&\le \sum _{E \in {{{\mathcal {T}}}}_h} \Vert f-\Pi _{E,q} f \Vert _{0, E, \omega } \Vert v_h \Vert _{0,E} \\&\lesssim \Big ( \sum _{E \in {{{\mathcal {T}}}}_h} \Vert f-\Pi _{E,q} f \Vert _{0, E, \omega }^2 \Big )^{1/2} \vert v \vert _{1,\Omega } \,. \end{aligned} \end{aligned}$$

(40)

This concludes the proof of Lemma 6. $\square $

Lemma 7

The quantity $(\mathrm{VIII})$ defined in (25) satisfies

$$\begin{aligned} \vert ( \mathrm{VIII} ) \vert \lesssim \Big ( \sum _{E \in {{{\mathcal {T}}}}_h} \big ( \eta _{\mathrm{coef},4}^2(E) + \eta _{\mathrm{coef},5}^2(E) + \eta _{\mathrm{coef},6}^2(E) \big ) \Big )^{1/2} \vert v \vert _{1,\Omega }\,, \end{aligned}$$

(41)

with

$$\begin{aligned} \begin{aligned} \eta _{\mathrm{coef},4}(E)&= \Vert \mu \nabla u^{{{\mathcal {N}}}\!N}- \Pi _{E,q} (\mu \nabla u^{{{\mathcal {N}}}\!N}) \Vert _{0,E,\omega } \,, \\ \eta _{\mathrm{coef},5}(E)&= h_E \Vert \varvec{\beta }\cdot \nabla u^{{{\mathcal {N}}}\!N}- \Pi _{E,q-1}( \varvec{\beta }\cdot \nabla u^{{{\mathcal {N}}}\!N}) \Vert _{0,E,\omega } \,, \\&\quad + \Vert \varvec{\beta }\cdot \nabla u^{{{\mathcal {N}}}\!N}- \Pi _{E,q}( \varvec{\beta }\cdot \nabla u^{{{\mathcal {N}}}\!N}) \Vert _{0,E,\omega } \\ \eta _{\mathrm{coef},6}(E)&= h_E \Vert \sigma u^{{{\mathcal {N}}}\!N}- \Pi _{E,q-1}( \sigma u^{{{\mathcal {N}}}\!N}) \Vert _{0,E,\omega } \\&\quad + \Vert \sigma u^{{{\mathcal {N}}}\!N}- \Pi _{E,q}( \sigma u^{{{\mathcal {N}}}\!N}) \Vert _{0,E,\omega }\,. \end{aligned} \end{aligned}$$

(42)

Proof

The term $(\mathrm{VIII})$ can be written as

$$\begin{aligned} (\mathrm{VIII})&= \underbrace{\sum _{E \in {{{\mathcal {T}}}}_h} \Big ( \sum _{\iota \in I^E} (\mu \nabla u^{{{\mathcal {N}}}\!N})(\xi ^E_\iota ) \cdot \nabla v_h \,\omega ^E_\iota - \int _E \Pi _{E,q}(\mu \nabla u^{{{\mathcal {N}}}\!N})\cdot \nabla v_h \Big )}_{(\text {VIIIa})}\nonumber \\&+ \underbrace{\sum _{E \in {{{\mathcal {T}}}}_h} \Big ( \sum _{\iota \in I^E} ( \varvec{\beta }\cdot \nabla u^{{{\mathcal {N}}}\!N})(\xi ^E_\iota ) \, v_h(\xi ^E_\iota ) \,\omega ^E_\iota - \int _E \Pi _{E,q-1}(\varvec{\beta }\cdot \nabla u^{{{\mathcal {N}}}\!N}) \, v_h \Big )}_{(\text {VIIIb})}\nonumber \\&+ \underbrace{\sum _{E \in {{{\mathcal {T}}}}_h} \Big ( \sum _{\iota \in I^E} (\sigma u^{{{\mathcal {N}}}\!N})(\xi ^E_\iota )\, v_h (\xi ^E_\iota ) \,\omega ^E_\iota - \int _E \Pi _{E,q-1}(\sigma u^{{{\mathcal {N}}}\!N}) \, v_h \Big )}_{(\text {VIIIc})} \,. \end{aligned}$$

(43)

Concerning $(\text {VIIIa})$, by the exactness of the quadrature rule and the fact that $\nabla v_h$ is piecewise constant, one has

$$\begin{aligned} (\text {VIIIa}) = \sum _{E \in {{{\mathcal {T}}}}_h} \sum _{\iota \in I^E} \big (\mu \nabla u^{{{\mathcal {N}}}\!N}- \Pi _{E,q}(\mu \nabla u^{{{\mathcal {N}}}\!N})\big )(\xi ^E_\iota ) \cdot \nabla v_h \,\omega ^E_\iota \,, \end{aligned}$$

which easily gives

$$\begin{aligned} \vert ( \mathrm{VIIIa} ) \vert \lesssim \Big ( \sum _{E \in {{{\mathcal {T}}}}_h} \Vert \mu \nabla u^{{{\mathcal {N}}}\!N}- \Pi _{E,q}(\mu \nabla u^{{{\mathcal {N}}}\!N}) \Vert _{0,E,\omega }^2 \Big )^{1/2} \vert v \vert _{1,\Omega } \,. \end{aligned}$$

The terms $(\text {VIIIb})$ and $(\text {VIIIc})$ are similar to the term $(\text {VII})$ above, in which f is replaced by $\varvec{\beta }\cdot \nabla u^{{{\mathcal {N}}}\!N}$ and $\sigma u^{{{\mathcal {N}}}\!N}$, respectively. Hence, they can be bounded as done for $(\text {VII})$. This concludes the proof of Lemma 7. $\square $

At this point, we are ready to derive the announced a posteriori error estimates. In order to get an upper bound of the error, we concatenate (18), (19), (24), (25), and use the bounds given in Lemmas 1 to 7, arriving at the following result.

Theorem 1

(a posteriori upper bound of the error) Let $u^{{{\mathcal {N}}}\!N} \in V^{{{\mathcal {N}}}\!N}$ satisfy (11). Then, the error $u-u^{{{\mathcal {N}}}\!N} $ can be estimated from above as follows:

$$\begin{aligned} \vert u - u^{{{\mathcal {N}}}\!N}\vert _{1,\Omega } \lesssim \Big ( \eta _\mathrm{res} + \eta _\mathrm{loss} + \eta _\mathrm{coef} + \eta _\mathrm{rhs} \Big ) \,, \end{aligned}$$

(44)

where

$$\begin{aligned} \eta _\mathrm{res}^2&= \sum _{E \in {{{\mathcal {T}}}}_h} \eta _\mathrm{res}^2(E) \,, \quad \eta _\mathrm{coef}^2 = \sum _{E \in {{{\mathcal {T}}}}_h} \sum _{k=1}^6\eta _{\mathrm{coef},k}^2(E) \,, \quad \eta _\mathrm{rhs}^2 = \sum _{E \in {{{\mathcal {T}}}}_h} \sum _{k=1}^2\eta _{\mathrm{rhs},k}^2(E)\,.\nonumber \\ \end{aligned}$$

(45)

We realize that the global estimator $\eta = \eta _\mathrm{res} + \eta _\mathrm{loss} + \eta _\mathrm{coeff} + \eta _\mathrm{rhs}$ is the sum of four contributions: $\eta _\mathrm{res}$ is the classical residual-based estimator, $\eta _\mathrm{loss}$ measures how small the minimized loss function is, i.e., how well the discrete variational equations (8) are fulfilled, whereas $\eta _\mathrm{coef}$ and $\eta _\mathrm{rhs}$ reflect the error in approximating elementwise the coefficients of the operator and the right-hand side by polynomials of degrees related to the precision of the quadrature formula.

It is possible to derive from (44) an element-based a posteriori error estimator, which can be used to design an adaptive strategy of mesh refinement (see, e.g. [18]). To this end, from now on we assume that the basis $\{\varphi _i : i\in I_h\}$ of $V_h$, introduced to define (9), is the canonical Lagrange basis associated with the nodes of the triangulation ${{{{\mathcal {T}}}}_h}$. Given any $E \in {{{{\mathcal {T}}}}_h}$, we introduce the elemental index set $I_h^E =\{ i \in I_h : E \subset \mathrm{supp}\, \varphi _i\}$, where $\mathrm{supp}\, \varphi _i$ is the support of $\varphi _i$, and we define a local contribution to the term $\eta _\mathrm{loss}$ as follows:

$$\begin{aligned} \eta _\mathrm{loss}^2(E) = C_h^2 \sum _{i \in I_h^E} r_{h,i}^2(u^{{{\mathcal {N}}}\!N}) \,, \end{aligned}$$

(46)

which satisfies

$$\begin{aligned} \eta _\mathrm{loss}^2 \le \sum _{E \in {{{\mathcal {T}}}}_h} \eta _\mathrm{loss}^2(E) \,. \end{aligned}$$

With this definition at hand, we can introduce the following elemental error estimator.

Definition 1

(elemental error estimator) For any $E \in {{{\mathcal {T}}}}_h$, let us set

$$\begin{aligned} \eta ^2(E) = \eta _\mathrm{res}^2(E) + \eta _\mathrm{loss}^2(E) + \sum _{k=1}^6\eta _{\mathrm{coef},k}^2(E) + \sum _{k=1}^2\eta _{\mathrm{rhs},k}^2(E) \,, \end{aligned}$$

(47)

where the addends in this sum are defined, respectively, in (30), (46), (23) and (42), (21) and (37).

Then, Theorem 1 can be re-formulated in terms of these quantities.

Corollary 2

(localized a posteriori error estimator) The error $u-u^{{{\mathcal {N}}}\!N} $ can be estimated as follows:

$$\begin{aligned} \vert u - u^{{{\mathcal {N}}}\!N}\vert _{1,\Omega } \lesssim \Big ( \sum _{E \in {{{\mathcal {T}}}}_h} \eta ^2(E) \Big )^{1/2} \,. \end{aligned}$$

(48)

Inequality (48) guarantees the reliability of the proposed error estimator, namely the estimator does provide a computable upper bound of the discretization error. Next result assures that the estimator is also efficient, namely it does not overestimate the error.

Theorem 3

(a posteriori lower bound of the error) Let $u^{{{\mathcal {N}}}\!N} \in V^{{{\mathcal {N}}}\!N}$ satisfy (11). Then, the error $u-u^{{{\mathcal {N}}}\!N} $ can be locally estimated from below as follows: for any $E \in {{{\mathcal {T}}}}_h$ it holds

$$\begin{aligned} \eta _{\mathrm{res}}(E)\lesssim & {} \vert u - u^{{{\mathcal {N}}}\!N}\vert _{1,D_E} + \sum _{E' \subset D_E} \left( \sum _{k=1}^3 \eta _{\mathrm{coef},k}^2(E') + \eta _{\mathrm{rhs},1}^2(E') \right) ^{1/2} \!\!\!\!\!\!\!\!\,, \end{aligned}$$

(49)

$$\begin{aligned} \frac{c_h}{C_h} \, \eta _{\mathrm{loss}}(E)\lesssim & {} \vert u - u^{{{\mathcal {N}}}\!N}\vert _{1,D_E} + \sum _{E' \subset D_E} \left( \sum _{k=1}^6 \eta _{\mathrm{coef},k}^2(E') + \sum _{k=1}^2\eta _{\mathrm{rhs},k}^2(E') \right) ^{1/2}\!\!\!\!\!\!\!\!. \end{aligned}$$

(50)

Proof

To derive (49), let us first consider the bulk contribution to the estimator. We apply a classical argument in a posteriori analysis, namely we introduce a non-negative bubble function $b_E \in V$ with support in E and such that $\Vert \phi \Vert _{0,E} \simeq \Vert b_E^{1/2} \phi \Vert _{0,E} $ and $\Vert \phi \Vert _{0,E} \simeq (\Vert b_E \phi \Vert _{0,E} + h_E\vert b_E \phi \vert _{1,E})$ for all $\phi \in {\mathbb {P}}_q(E)$.

Let us set $w_E=\mathrm{bulk}_E(u^{{{\mathcal {N}}}\!N}) b_E \in V$. Then,

$$\begin{aligned} \Vert \mathrm{bulk}_E(u^{{{\mathcal {N}}}\!N}) \Vert _{0,E}^2 \lesssim \int _E \mathrm{bulk}_E(u^{{{\mathcal {N}}}\!N})^2 b_E = \int _E \mathrm{bulk}_E(u^{{{\mathcal {N}}}\!N}) \, w_E. \end{aligned}$$

Writing

$$\begin{aligned} \begin{aligned} \mathrm{bulk}_E(u^{{{\mathcal {N}}}\!N})&= (f-Lu^{{{\mathcal {N}}}\!N}) + \nabla {\cdot }(\Pi _{E,q} (\mu \nabla u^{{{\mathcal {N}}}\!N})-\mu \nabla u^{{{\mathcal {N}}}\!N}) \\&\quad + \Pi _{E,q-1}( \varvec{\beta }\cdot \nabla u^{{{\mathcal {N}}}\!N}) - \varvec{\beta }\cdot \nabla u^{{{\mathcal {N}}}\!N}+ \Pi _{E,q-1}( \sigma u^{{{\mathcal {N}}}\!N}) - \sigma u^{{{\mathcal {N}}}\!N}\\&\quad + \Pi _{q-1,E}f-f \,, \\ \end{aligned} \end{aligned}$$

we obtain

$$\begin{aligned} \begin{aligned} \int _E \mathrm{bulk}_E(u^{{{\mathcal {N}}}\!N}) \, w_E&= a(u-u^{{{\mathcal {N}}}\!N}, w_E) - \int _E (\Pi _{E,q} (\mu \nabla u^{{{\mathcal {N}}}\!N})-\mu \nabla u^{{{\mathcal {N}}}\!N})\cdot \nabla w_E \\&\quad + \int _E (\Pi _{E,q-1}( \varvec{\beta }\cdot \nabla u^{{{\mathcal {N}}}\!N}) - \varvec{\beta }\cdot \nabla u^{{{\mathcal {N}}}\!N})(w_E - m(w_E)) \\&\quad + \int _E (\Pi _{E,q-1}( \sigma u^{{{\mathcal {N}}}\!N}) - \sigma u^{{{\mathcal {N}}}\!N}) (w_E - m(w_E)) \\&\quad + \int _E (\Pi _{q-1,E}f-f)(w_E - m(w_E)) \,, \\ \end{aligned} \end{aligned}$$

whence

$$\begin{aligned} \Vert \mathrm{bulk}_E(u^{{{\mathcal {N}}}\!N}) \Vert _{0,E}^2 \lesssim \Big ( \vert u - u^{{{\mathcal {N}}}\!N}\vert _{1,E} + \sum _{k=1}^3 \eta _{\mathrm{coef},k}(E) + \eta _{\mathrm{rhs},1}(E) \Big ) \vert w_E \vert _{1,E} \,. \end{aligned}$$

Using $ \vert w_E \vert _{1,E} \lesssim h_E^{-1} \Vert \mathrm{bulk}_E(u^{{{\mathcal {N}}}\!N}) \Vert _{0,E}$, we arrive at

$$\begin{aligned} h_E \Vert \mathrm{bulk}_E(u^{{{\mathcal {N}}}\!N}) \Vert _{0,E} \lesssim \vert u - u^{{{\mathcal {N}}}\!N}\vert _{1,E} + \sum _{k=1}^3 \eta _{\mathrm{coef},k}(E) + \eta _{\mathrm{rhs},1}(E) \,. \end{aligned}$$

(51)

Let us now turn to the jump contribution to the estimator. Given an edge $e \subset \partial E$ shared with the element $E'$, we introduce a non-negative bubble function $b_e \in V$, with support in $E \cup E'$ and such that $\Vert \phi \Vert _{0,e} \simeq \Vert b_e^{1/2} \phi \Vert _{0,e} $ and $ (h_E^{-1/2} \Vert b_e \phi \Vert _{0,E} + h_E^{1/2}\vert b_e \phi \vert _{1,E}) \lesssim \Vert \phi \Vert _{0,e}$ for all $\phi \in {\mathbb {P}}_q(E)$.

Let us extend the function $\mathrm{jump_e(u^{{{\mathcal {N}}}\!N})}$ onto $E \cup E'$ to be constant in the normal direction to e, obtaining a polynomial of degree q in each element. Let us set $w_e = \mathrm{jump_e(u^{{{\mathcal {N}}}\!N})} b_e \in V$. Then, writing $E_1=E$ and $E_2=E'$, one has

$$\begin{aligned} \Vert \mathrm{jump}_e(u^{{{\mathcal {N}}}\!N}) \Vert _{0,e}^2&\lesssim \int _e \mathrm{jump}_e(u^{{{\mathcal {N}}}\!N})^2 b_e = \int _e \mathrm{jump}_e(u^{{{\mathcal {N}}}\!N}) \, w_e \\&\ = \ \int _e \mathrm{jump}_e(u^{{{\mathcal {N}}}\!N}- u) \, w_e \, \\&\ = \ \sum _{i=1}^2 \int _{E_i} \nabla \cdot [ (\Pi _{E_i,q}(\mu \nabla u^{{{\mathcal {N}}}\!N}) - \mu \nabla u) \, w_e ] \\&\ = \ \sum _{i=1}^2 \int _{E_i} [ \nabla \cdot \Pi _{E_i,q}(\mu \nabla u^{{{\mathcal {N}}}\!N}) - \nabla \cdot (\mu \nabla u) ] w_e \\&\ \quad + \ \sum _{i=1}^2 \int _{E_i} [ \Pi _{E_i,q}(\mu \nabla u^{{{\mathcal {N}}}\!N}) - \mu \nabla u ] \cdot \nabla w_e \,. \end{aligned}$$

We now recall that

$$\begin{aligned} \nabla \cdot \Pi _{E_i,q} (\mu \nabla u^{{{\mathcal {N}}}\!N}) = \mathrm{bulk}_{E_i}(u^{{{\mathcal {N}}}\!N}) - \Pi _{E_i,q-1}f + \Pi _{E_i,q-1}( \varvec{\beta }\cdot \nabla u^{{{\mathcal {N}}}\!N}+ \sigma u^{{{\mathcal {N}}}\!N}) \,, \end{aligned}$$

as well as $\nabla \cdot (\mu \nabla u) = - f + \varvec{\beta }\cdot \nabla u + \sigma u$. We write $u= u^{{{\mathcal {N}}}\!N}+ (u-u^{{{\mathcal {N}}}\!N})$ and we proceed as in the proof of (51), using now the bounds $ \Vert w_e \Vert _{0,E_i} \lesssim h_{E_i}^{1/2} \Vert \mathrm{jump}_e (u^{{{\mathcal {N}}}\!N}) \Vert _{0,e}$ and $ \vert w_e \vert _{1,E_i} \lesssim h_{E_i}^{-1/2} \Vert \mathrm{jump}_e (u^{{{\mathcal {N}}}\!N}) \Vert _{0,e}$, arriving at the bound

$$\begin{aligned} h_E^{1/2} \sum _{e \subset \partial E} \Vert \,\mathrm{jump}_e(u^{{{\mathcal {N}}}\!N}) \, \Vert _{0,e}&\lesssim \vert u - u^{{{\mathcal {N}}}\!N}\vert _{1,D_E} + \sum _{E' \subset D_E} h_{E'} \Vert \mathrm{bulk}_{E'}(u^{{{\mathcal {N}}}\!N}) \Vert _{0,E'}\nonumber \\&\quad + \sum _{E' \subset D_E} \left( \sum _{k=1}^3 \eta _{\mathrm{coef},k}(E') + \eta _{\mathrm{rhs},1}(E') \right) \,. \end{aligned}$$

(52)

Together with (51), this gives the bound (49). In order to derive (50), we write (46) as

$$\begin{aligned} C_h^{-1} \eta _\mathrm{loss}(E) = \Big ( \sum _{i \in I_h^E} r_{h,i}^2(u^{{{\mathcal {N}}}\!N}) \Big )^{1/2} = \quad \sup _{{\varvec{v}}} \frac{1}{\Vert {\varvec{v}}\Vert _2} \sum _{i \in I_h^E} r_{h,i}(u^{{{\mathcal {N}}}\!N}) v_i \end{aligned}$$

where ${\varvec{v}} = (v_i) \in {\mathbb {R}}^{\mathrm{card} I_h^E }$. Defining the function $v_h^E = \sum _{i \in I_h^E} v_i \varphi _i \in V_h$, which is supported in $D_E$, and recalling (9), we have

$$\begin{aligned} \sum _{i \in I_h^E} r_{h,i}(u^{{{\mathcal {N}}}\!N}) v_i = F_h(v_h^E) - a_h(u^{{{\mathcal {N}}}\!N},v_h^E)\,. \end{aligned}$$

By the left-hand inequality in (31), we obtain

$$\begin{aligned} \frac{c_h}{C_h} \, \eta _{\mathrm{loss}}(E) \ \le \ \sup _{v_h^E} \frac{F_h(v_h^E) - a_h(u^{{{\mathcal {N}}}\!N},v_h^E)}{\vert v_h^E \vert _{1,D_E}}\,. \end{aligned}$$

Now we write

$$\begin{aligned} \begin{aligned} F_h(v_h^E) - a_h(u^{{{\mathcal {N}}}\!N},v_h^E)&= F_h(v_h^E) - F(v_h^E) \\&\quad + f(v_h^E) - a(u^{{{\mathcal {N}}}\!N},v_h^E) \\&\quad + a(u^{{{\mathcal {N}}}\!N},v_h^E) - a_h(u^{{{\mathcal {N}}}\!N},v_h^E) \,. \end{aligned} \end{aligned}$$

The term $ F_h(v_h^E) - F(v_h^E) =[F_h(v_h^E) - F_\pi (v_h^E)] + [F_\pi (v_h^E) - F(v_h^E)]$ can be bounded as done for the terms (I) and (VII) above, yielding

$$\begin{aligned} \vert F_h(v_h^E) - F(v_h^E) \vert \lesssim \sum _{E' \subset D_E} \left( \eta _{\mathrm{rhs},1}(E')+ \eta _{\mathrm{rhs},2}(E') \right) \vert v_h^E \vert _{1,E'} \,. \end{aligned}$$

Similarly, the term $a(u^{{{\mathcal {N}}}\!N},v_h^E) - a_h(u^{{{\mathcal {N}}}\!N},v_h^E)$ can be handled as done for the terms (III) and (VIII) above, obtaining

$$\begin{aligned} \vert a(u^{{{\mathcal {N}}}\!N},v_h^E) - a_h(u^{{{\mathcal {N}}}\!N},v_h^E) \vert \lesssim \sum _{E' \subset D_E} \left( \sum _{k=1}^6\eta _{\mathrm{coeff},k}(E') \right) \vert v_h^E \vert _{1,E'} \,. \end{aligned}$$

Finally, one has $\vert f(v_h^E) - a(u^{{{\mathcal {N}}}\!N},v_h^E) \vert \lesssim \vert u-u^{{{\mathcal {N}}}\!N}\vert _{1,D_E} \vert v_h^E \vert _{1,D_E}$, thereby concluding the proof of (50). $\square $

Remark 3

(Comparison with different error analyses) In the last few years a PINN error analysis involving approximation, optimization and generalization errors has emerged [14]. The approximation error is associated with the expressivity and the best approximation error of a neural network with a fixed architecture, the optimization error with the local minima found during the optimization phase, and the generalization error with the difference between the function represented by the trained neural network and the exact solution. In this perspective, the present paper provides a connection between the optimization and generalization errors. Indeed, Theorems 1 and 3 contain upper and lower bounds for the generalization error that are based on the value of the loss function and other suitable terms. We highlight that different results relating these two errors for PINNs are available in the literature (see [19, 20]). The analysis here proposed does not require any assumption on the training set, provides a computable information during training even when the current iterate is far from any good local minimum, and yields both global and local control on the $H^1$ error (see Corollary 2).

4 Numerical results

Let us consider the two-dimensional domain $\Omega =(0,1)^2$ and the Poisson problem:

$$\begin{aligned} {\left\{ \begin{array}{ll} -\Delta u = f &{} \text {in } \Omega \,, \\ \ \ \, u=g &{} \text {on } \Gamma \,, \end{array}\right. } \end{aligned}$$

(53)

with the functions f and g such that the exact solution, represented in Fig. 1, is

$$\begin{aligned} u(x,y) = \tanh \left[ 2\left( x^3 - y^4\right) \right] . \end{aligned}$$

(54)

Problem (53) is numerically solved by the VPINN discretization described in Sect. 2.1, extended to handle non-homogeneous Dirichlet condition as mentioned in Remark 1. The used VPINN is a feed-forward fully connected neural network comprised by an input layer with input dimension $n=2$, three hidden layers with 50 neurons each and an output layer with a single output variable; it thus contains 7851 trainable weights; furthermore, in all the layers except the output one the activation function is the hyperbolic tangent. The VPINN output is modified as described in [13] to exactly impose the Dirichlet boundary conditions. Gaussian quadrature rules of order $q=3$ are used in the definition of the loss function.

For ease of implementation, the orthogonal projection operators $\Pi _{E,k}$, defined in Sect. 3, are mimiked by interpolation operators as follows. Let us initially consider the elemental Lagrange interpolation operator ${{{\mathcal {I}}}}_{E,k}:C^0(E)\rightarrow \mathbb P_k(E)$; then, to guarantee orthogonality to constants, the projection operator ${\tilde{\Pi }}_{E,k}:C^0(E)\rightarrow \mathbb P_k(E)$ is defined by setting

$$\begin{aligned} {\tilde{\Pi }}_{E,k}\varphi := {{{\mathcal {I}}}}_{E,k}\varphi + \dfrac{\int _E \left( \varphi - {{{\mathcal {I}}}}_{E,k}\varphi \right) }{\vert E\vert }, \quad \forall \varphi \in C^0(E), \end{aligned}$$

where, in practice, the integral $\int _E \left( \varphi - {{{\mathcal {I}}}}_{E,k}\varphi \right) $ can be computed with quadrature rules that are more accurate than the ones used in the other operations. In this work we use quadrature rules of order 7 in each element.

The involved neural networks are initially trained using the first-order optimizer ADAM [21] for 3000 epochs with an exponentially decaying learning rate ranging between $10^{-2}$ and $10^{-3}$. The corresponding solutions are then improved using the second-order BFGS optimizer [22] for 2000 epochs or until two subsequent iterates coincide (up to machine precision). Since the VPINN output is suitably modified to exactly satisfy the Dirichlet boundary conditions, the training set contains only the quadrature points required to compute the quantity (6).

The VPINN is trained on different meshes and the corresponding error estimators $\big (\sum _{E \in {{{\mathcal {T}}}}_h} \eta ^2(E) \big )^{1/2}$ are computed. In Table 1 we indicate the number $N_{\text {qp}}$ of quadrature points and the number $\text {dim}(V_h)$ of test functions required to construct the loss function using a mesh with meshsize h. We highlight that $N_{\text {qp}}$ coincides with the dimension of the training set.

Table 1 Meshsizes h of the meshes used in Fig. 2, corresponding dimension $N_{\text {qp}}$ of the training set and number $\text {dim}(V_h)$ of test functions

Full size table

Once more, when exact integrals are involved, they are approximated with higher order quadrature rules. The obtained results are shown in Fig. 2, where the values of the $H^1$-error and the a posteriori estimator are displayed for several meshes of stepsize h. Remarkably, the error estimators (red dots) behave very similarly to the corresponding energy errors (blue dots). Moreover, coherently with the results discussed in [13], after an initial preasymptotic phase all dots are aligned on straight lines with slopes very close to 4 (the slope of the red line is 3.81, the slope of the blue line is 3.92). We remark that, since the used triangulations are quasi-uniform, both $N_{\text {qp}}$ and $\text {dim}(V_h)$ approximately scale as $O(h^{-2})$; therefore the qualitative behaviour shown in Fig. 2 does not change if any of these two quantities is represented on the x-axis.

It is also interesting to note that the terms appearing in the a posteriori estimator (recall (44)) exhibit different behaviors during the training of a single VPINN. This phenomenon is highlighted in Fig. 3, where one can observe the evolution of the quantities $\eta _\mathrm{rhs}$, $\eta _\mathrm{coef}$, $\eta _\mathrm{res}$, $\eta _\mathrm{loss}$, $\eta $ and $\vert u - u^{{{\mathcal {N}}}\!N}\vert _{1,\Omega }$, where each $\eta _*$ stands for $\eta _* = \big (\sum _{E \in {{{\mathcal {T}}}}_h} \eta _*^2(E) \big )^{1/2}$ with $*\in \{\text {rhs},\text {coef},\text {res},\text {loss}\}$, during the 3000 epochs performed with the ADAM optimizer. It can be observed that, during this training, while the value of the loss function decreases, the accuracy remains almost constant because other sources of error, independent of the neural network, prevail.

5 Conclusions

We considered the discretization of a model elliptic boundary-value problem by variational physics-informed neural networks (VPINNs), in which test functions are continuous, piecewise linear functions on a triangulation of the domain. The scheme can be viewed as an instance of a least-square/Petrov–Galerkin method.

We introduced an a posteriori error estimator, which sums-up four contributions: the equation residual (measuring the elemental bulk residuals and the edge jump terms, for approximated coefficients and right-hand side), the coefficients’ oscillation, the right-hand side’s oscillation, and a scaled value of the loss-function. The latter term corresponds to an inexact solve of the algebraic system arising from the discretization of the variational equations.

The main result of the paper is the proof that the estimator provides a global upper bound and a local lower bound for the energy norm of the error between the exact and VPINN solutions. In other words, the a posteriori estimator is both reliable and efficient. Numerical results show an excellent agreement with the theoretical predictions.

In a forthcoming paper, we will investigate the use of the proposed estimator to design an adaptive strategy of discretization.

References

Raissi, M., Perdikaris, P., Karniadakis, G.E.: Physics-informed neural networks: a deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. J. Comput. Phys. 378, 686–707 (2019). https://doi.org/10.1016/j.jcp.2018.10.045
Article MathSciNet MATH Google Scholar
Tartakovsky, A.M., Marrero, C.O., Perdikaris, P., Tartakovsky, G.D., Barajas-Solano, D.: Learning parameters and constitutive relationships with physics informed deep neural networks. arXiv preprint arXiv:1808.03398 (2018)
Yang, Y., Perdikaris, P.: Adversarial uncertainty quantification in physics-informed neural networks. J. Comput. Phys. 394, 136–152 (2019)
Article MathSciNet Google Scholar
Lanthalet, S., Mishra, S., Karniadakis, G.E.: Error estimates for deeponets: a deep learning framework in infinite dimensions. arXiv preprint arXiv:2102.09618v2 (2021)
Kharazmi, E., Zhang, Z., Karniadakis, G.E.: VPINNs: variational physics-informed neural networks for solving partial differential equations. arXiv preprint arXiv:1912.00873 (2019)
Khodayi-Mehr, R., Zavlanos, M.: VarNet: Variational neural networks for the solution of partial differential equations. In: Learning for Dynamics and Control, pp. 298–307 (2020). PMLR
Elbrächter, D., Perekrestenko, D., Grohs, P., Bölcskei, H.: Deep neural network approximation theory. IEEE Trans. Inf. Theory 67(5), 2581–2623 (2021)
Article MathSciNet Google Scholar
Gühring, I., Kutyniok, G., Petersen, P.: Error bounds for approximations with deep ReLU neural networks in $W^{s, p}$ norms. Anal. Appl. 18(05), 803–859 (2020)
Article MathSciNet Google Scholar
Opschoor, J.A., Petersen, P.C., Schwab, C.: Deep ReLU networks and high-order finite element methods. Anal. Appl. 18(05), 715–770 (2020)
Article MathSciNet Google Scholar
Kutyniok, G., Petersen, P., Raslan, M., Schneider, R.: A theoretical analysis of deep neural networks and parametric PDEs. Constr. Approx., 1–53 (2021)
Opschoor, J.A., Schwab, C., Zech, J.: Exponential ReLU DNN expression of holomorphic maps in high dimension. Constr. Approx., 1–46 (2021)
Gonon, L., Schwab, C.: Deep ReLU neural networks overcome the curse of dimensionality for partial integrodifferential equations. arXiv preprint arXiv:2102.11707 (2021)
Berrone, S., Canuto, C., Pintore, M.: Variational physics informed neural networks: the role of quadratures and test functions. J. Sci. Comput. 92(3), 1–27 (2022)
Article MathSciNet Google Scholar
Kutyniok, G.: The mathematics of artificial intelligence. arXiv preprint arXiv:2203.08890 (2022)
Sukumar, N., Srivastava, A.: Exact imposition of boundary conditions with distance functions in physics-informed deep neural networks. Comput. Methods Appl. Mech. Engrg. 389, 114333–50 (2022). https://doi.org/10.1016/j.cma.2021.114333
Article MathSciNet MATH Google Scholar
Clément, P.: Approximation by finite element functions using local regularization. Rev. Francaise Autom. Inf. Rech. Opér. Sér. Anal. Numér. 9(R–2), 77–84 (1975)
MathSciNet MATH Google Scholar
Verfürth, R.: A Review of a Posteriori Error Estimation and Adaptive Mesh-Refinement Techniques. Advanced Numerical Mathematics. Wiley-Teubner, Chichester (1996)
MATH Google Scholar
Nochetto, R.H., Veeser, A.: Primer of adaptive finite element methods. In: Multiscale and Adaptivity: Modeling, Numerics and Applications. Springer Lecture Notes in Math., CIME Series, vol. 2040, pp. 125–225 (2012)
De Ryck, T., Mishra, S.: Error analysis for physics informed neural networks (pinns) approximating kolmogorov pdes. arXiv preprint arXiv:2106.14473 (2021)
De Ryck, T., Jagtap, A.D., Mishra, S.: Error estimates for physics informed neural networks approximating the navier-stokes equations. arXiv preprint arXiv:2203.09346 (2022)
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014). https://doi.org/10.48550/arXiv.1412.6980
Wright, S., Nocedal, J., et al.: Numerical optimization. Springer Sci. 35(67–68), 7 (1999)
MATH Google Scholar

Download references

Acknowledgements

The authors performed this research in the framework of the Italian MIUR Award “Dipartimenti di Eccellenza 2018-2022” granted to the Department of Mathematical Sciences, Politecnico di Torino (CUP: E11G18000350001). The research leading to this paper has also been partially supported by the SmartData@PoliTO center for Big Data and Machine Learning technologies. SB was supported by the Italian MIUR PRIN Project 201744KLJL-004, CC was supported by the Italian MIUR PRIN Project 201752HKH8-003. The authors are members of the Italian INdAM-GNCS research group.

Funding

Open access funding provided by Politecnico di Torino within the CRUI-CARE Agreement.

Author information

Authors and Affiliations

Dipartimento di Scienze Matematiche, Politecnico di Torino, Corso Duca degli Abruzzi 24, 10129, Torino, Italy
Stefano Berrone, Claudio Canuto & Moreno Pintore

Authors

Stefano Berrone
View author publications
You can also search for this author in PubMed Google Scholar
Claudio Canuto
View author publications
You can also search for this author in PubMed Google Scholar
Moreno Pintore
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Claudio Canuto.

Ethics declarations

Conflict of interests

The authors have no relevant financial or non-financial interests to disclose.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Berrone, S., Canuto, C. & Pintore, M. Solving PDEs by variational physics-informed neural networks: an a posteriori error analysis. Ann Univ Ferrara 68, 575–595 (2022). https://doi.org/10.1007/s11565-022-00441-6

Download citation

Received: 29 April 2022
Accepted: 25 August 2022
Published: 19 September 2022
Issue Date: November 2022
DOI: https://doi.org/10.1007/s11565-022-00441-6

Keywords

Mathematics Subject Classification

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Solving PDEs by variational physics-informed neural networks: an a posteriori error analysis

Abstract

Similar content being viewed by others

An improved physics-informed neural network with adaptive weighting and mixed differentiation for solving the incompressible Navier–Stokes equations

A Review of Physics Informed Neural Networks for Multiscale Analysis and Inverse Problems

Finite basis physics-informed neural networks (FBPINNs): a scalable domain decomposition approach for solving differential equations

1 Introduction

2 The model boundary-value problem

Remark 1

2.1 The VPINN discretization

Remark 2

3 The a posteriori error estimator

Lemma 1

Proof

Lemma 2

Proof

Lemma 3

Proof

Lemma 4

Proof

Lemma 5

Proof

Lemma 6

Proof

Lemma 7

Proof

Theorem 1

Definition 1

Corollary 2

Theorem 3

Proof

Remark 3

4 Numerical results

5 Conclusions

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interests

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation