1 Introduction

In recent years, the Bayesian paradigm has become a popular framework to perform uncertainty quantification. It has found its application in global optimization (Mockus 1989), inverse modeling (Stuart 2010) and data assimilation (Law et al. 2015) contexts, among others. Commonly, given some numerical model, a prior distribution is assumed over its parameters, and the Bayesian paradigm provides a consistent framework to estimate these parameters and to quantify and propagate their associated uncertainty. It should be noted, however, that even if complete certainty could be obtained over the model parameters, there would still be a remaining uncertainty to the solution due to approximations made in the numerical model. This key observation is what underpins the current trend towards probabilistic numerics.

At the core of probabilistic numerics, the estimation of an unknown field is recast as a statistical inference problem, which allows for the estimation of the field with some uncertainty measure (Larkin 1972; Diaconis 1988). Early examples of the application of Bayesian probabilistic numerics include computing integrals (O’Hagan 1991) and solving ordinary differential equations (Skilling 1992). More recently, following a “call to arms” from Hennig et al. (2015), a large push has been made to apply this framework to a wide range of problems, ranging from solving linear systems (Hennig 2015; Cockayne et al. 2019; Wenger et al. 2020) to quadrature (Karvonen and Särkkä 2017; Briol et al. 2017) to solving ordinary differential equations (Schober et al. 2014; Hennig et al. 2014; Teymur et al. 2016). For a general overview of the current state-of-the-art of probabilistic numerics, the reader is referred to Hennig et al. (2022). Most relevant for the work presented in this paper are the probabilistic numerical methods that have been developed for the solving of partial differential equations, which can be roughly divided into two categories: meshfree probabilistic solvers, and solver-perturbing error estimators.

The first category (Chkrebtii et al. 2016; Cockayne et al. 2017; Raissi et al. 2018; Wang et al. 2021) can be seen as a way to find solutions to partial differential equations directly from the strong form in a Bayesian manner. A prior is assumed over the solution field, which is updated by evaluating its derivatives on a grid of collocation points, allowing for a solution to be obtained without needing to apply a finite element discretization over the domain. This approach to solving partial differential equations shares some similarities with Bayesian physics-informed neural networks (Raissi et al. 2019; Yang et al. 2021), the main difference lying in the function that is being fitted at the collocation points. The way in which these meshfree solvers relate to traditional collocation methods is similar to the way in which Bayesian physics-informed neural networks relate to their deterministic counterparts.

The second category (Conrad et al. 2017; Kersting and Hennig 2018; Lie et al. 2019) is focused on estimating the discretization error of traditional solvers for differential equations. For ordinary differential equations, the usual time integration step is taken, after which the solution is perturbed by adding Gaussian noise, representing the uncertainty in the time integration result. Similarly, for partial differential equations, the traditional spatial discretization is perturbed using small support Gaussian random fields, which reflect the uncertainty introduced by the mesh. In Abdulle and Garegnani (2020, 2021), a similar approach is taken, but rather than adding noise to the solution, an uncertainty is introduced by perturbing the time step size or finite element discretization. A more formal mathematical basis for probabilistic numerical methods can be found in Cockayne et al. (2019), where a more rigorous definition of the term is outlined and a common framework underpinning these two seemingly separate categories is established.

It is worth noting that these probabilistic numerical methods are a deviation from traditional error estimators (Babuška and Rheinboldt 1979; Babuška and Miller 1987; Zienkiewicz and Zhu 1987), as they embed the model error into the method itself, rather than estimate it a posteriori. This inherently affects the model output, which depending on the context can be a desirable or undesirable property. In Rouse et al. (2021), a method is presented to obtain full-field error estimates by assuming a Gaussian process prior over the discretization error, and updating it based on a set of traditional estimators of error in quantities of interest. This way, a distribution representing the finite element discretization error can be obtained in a non-intrusive manner.

The shared goal of these methods is to accurately describe the errors made due to limitations of our numerical models, though their method of modeling error differs. At the core, the meshfree probabilistic solvers model error as the result of using a finite number of observations to obtain a solution to an infinite-dimensional problem. The solver-perturbing error estimators, on the other hand, take an existing discretization, like the one used in the finite element method, and assign some uncertainty measure to the existing solver. This begs the question: what happens if the methodology from the meshfree probabilistic solvers is applied to existing mesh-based solvers of partial differential equations? Little research has thus far been conducted to answer this question, though two particular works are worth pointing out.

A brief remark is made in Bilionis (2016) describing a Bayesian probabilistic numerical method whose posterior mean is equivalent to the finite element solution. However, this idea is then discarded due to infinite variances arising in the posterior distribution. In Pförtner et al. (2023), the probabilistic meshfree solvers from Cockayne et al. (2017) are generalized to methods of weighted residuals, which includes the finite element method. Of particular relevance to our work is their construction of prior distributions whose posterior mean is guaranteed to be equivalent to the usual finite element solution. Doing this would allow one to replace the traditional finite element solver with the probabilistic one, in order to quantify the finite element discretization error. However, their experimental results are limited to one-dimensional test cases, possibly because the application of their formulation to unstructured triangular or quadrilateral meshes would result in integrals in the information operator that are computationally too expensive.

In this work, we propose a probabilistic numerical method for the modeling of finite element discretization error. The solution is endowed with a Gaussian process prior, which is then updated based on observations of the right-hand side from a finite element discretization. This allows for the approximation of the true solution while including the uncertainty resulting from the finite discretization that is applied. Rather than work directly with the Gaussian process distribution over the exact solution space, we introduce a second discretization over the domain that is fine enough to represent the exact solution. This second discretization helps to avoid the infinite variances brought up in Bilionis (2016) as well as the computationally expensive integrals from Pförtner et al. (2023). We present a class of priors that naturally accounts for the smoothness of the partial differential equation at hand, and show how the assembly of large full covariance matrices can be avoided. A particular focus of this work is on the relationship between the posterior covariance of our formulation and the finite element discretization error. The relationship between these two quantities is often left to intuition, reasoning along the lines that since the posterior covariance contains remaining model uncertainty, it must reflect the discretization error. We challenge this assumption and investigate more thoroughly which conditions need to be met before the posterior covariance can reasonably be said to capture the finite element discretization error.

The underlying goal of the development of a Bayesian model for the finite element discretization error is to enable the propagation of discretization error to quantities of interest through the computational pipelines that arise in multiscale modeling, inverse modeling and data assimilation settings. This consistent treatment of discretization error in turn allows for more informed decisions to be made about its impact on the model output. To give a concrete example, in Girolami et al. (2021), a Bayesian framework for the assimilation of measurement data and finite element models is presented. Within this framework, a model misspecification component is defined, which is endowed with a squared-exponential Gaussian process prior. The Bayesian formulation of the finite element method that we derive in this work would allow for a more informative choice of prior distribution over the model misspecification component, for example by separating out the discretization error from the error associated with other modeling assumptions.

In the context of Bayesian inverse modeling, our proposed method could prove particularly useful. For the Metropolis-Hastings sampling strategies that are commonly employed, a finite element solve is necessary for each sample that is drawn, which typically needs to be done tens or hundreds of thousands of times. The goals of having a negligible discretization error and a computational cost that is not prohibitive can therefore be in conflict. Rather than attempt to fully resolve the discretization error, it can be more practical to use a coarse mesh and account for the associated error in the likelihood of the Bayesian inverse model. To do this, a probability density that is reflective of the discretization error of the coarse solve is needed, which is what our Bayesian formulation of the finite element method aims to provide.

The outline of this paper is as follows: in Sect. 2, we derive our Bayesian formulation of the finite element method. This is followed by a discussion on the choice of prior covariance in Sect. 3, where two different choices of prior distribution are investigated. Two examples, a one-dimensional tapered bar and a two-dimensional perforated plate, are showcased throughout this section to validate the conclusions drawn from theory. Finally, in Sect. 4, the conclusions of this paper are drawn and discussed.

2 Bayesian finite element method

In this section, the proposed Bayesian version of the finite element method is derived. Although the method is applicable to a broad range of linear elliptic partial differential equations, for the purposes of demonstration, we will consider Poisson’s equation:

$$\begin{aligned} \begin{aligned} -\Delta u(\textbf{x})&= f(\textbf{x}){} & {} \text { in } \Omega \\ u(\textbf{x})&= 0{} & {} \text { on } \partial \Omega \end{aligned} \end{aligned}$$
(1)

Here, \(\Omega \) and \(\partial \Omega \) are the domain and its boundary, respectively. \(u(\textbf{x})\) and \(f(\textbf{x})\) are the solution and forcing term, which are linked through the Laplace operator \(\Delta \).

2.1 Continuous formulation

We will start with the derivation of a continuous posterior distribution over the solution space conditioned on the finite element force vector, largely following Bilionis (2016). As usual, the problem is restated in its weak formulation:

$$\begin{aligned} \begin{aligned} \int _{\Omega } \nabla u(\textbf{x}) \cdot \nabla v(\textbf{x}) \, \textrm{d}\textbf{x}&= \int _{\Omega } f(\textbf{x}) v(\textbf{x}) \, \textrm{d}\textbf{x}{} & {} \forall v(\textbf{x}) \in \mathcal {V}\end{aligned} \end{aligned}$$
(2)

We search \(u(\textbf{x}) \in \mathcal {V}\), where \(\mathcal {V}=H^1_0\) is a Sobolev space of functions over \(\Omega \) that are weakly once-differentiable and vanish at the boundary \(\partial \Omega \). This space is equipped with an inner product and thus also forms a Hilbert space. Now, a discretization is defined over the domain using a set of locally supported shape functions \(\{\psi _i(\textbf{x})\}_{i=1}^m\), which span a finite-dimensional space \(\mathcal {W}^h\subset \mathcal {V}\). The test function \(v^{h}(\textbf{x})\) can be defined in terms of these shape functions:

$$\begin{aligned} \begin{aligned} v^{h}(\textbf{x}) = \sum _{i=1}^m v_i \psi _i(\textbf{x}){} & {} \text { with } \psi _i(\textbf{x}) \in \mathcal {W}^h\end{aligned} \end{aligned}$$
(3)

Since Eq. (2) has to hold for all \(v^{h}(\textbf{x}) \in \mathcal {W}^h\), the weights \(v_i\) can be chosen at will. Substituting Eq. (3) into Eq. (2), a finite set of m equations in constructed by choosing \(v_i = \delta _{ij}\) for the jth equation, where \(\delta _{ij}\) is the Kronecker delta function. This yields the entries of the finite element force vector \(\textbf{g}\):

$$\begin{aligned} \begin{aligned} g_i = \int _{\Omega } f(\textbf{x}) \psi _i(\textbf{x}) \, \textrm{d}\textbf{x}\end{aligned} \end{aligned}$$
(4)

We can relate the solution \(u(\textbf{x})\) to the force vector \(\textbf{g}\) via the linear operator \(\varvec{\mathcal {L}}\):

$$\begin{aligned} \begin{aligned} \varvec{\mathcal {L}}\left[ u(\textbf{x})\right] = \textbf{g}\end{aligned} \end{aligned}$$
(5)

where \(\varvec{\mathcal {L}}\left[ u(\textbf{x})\right] =\left[ \begin{array}{llll} \mathcal {L}_{1}\left[ u(\textbf{x})\right]&\mathcal {L}_{2}\left[ u(\textbf{x})\right]&\dots&\mathcal {L}_{m}\left[ u(\textbf{x})\right] \end{array}\right] ^{T}\) is given by:

$$\begin{aligned} \begin{aligned} \mathcal {L}_{i} [u(\textbf{x})] = \int _{\Omega } \nabla u(\textbf{x}) \cdot \nabla \psi _i(\textbf{x}) \, \textrm{d}\textbf{x}\end{aligned} \end{aligned}$$
(6)

A centered Gaussian process with a positive definite covariance function \(k(\textbf{x}, \textbf{x}')\) is now assumed over the solution \(u(\textbf{x})\):

$$\begin{aligned} \begin{aligned} u(\textbf{x}) \sim \mathcal{G}\mathcal{P}\left( 0, k(\textbf{x}, \textbf{x}')\right) \end{aligned} \end{aligned}$$
(7)

Because we have a linear map \(\varvec{\mathcal {L}}\) from \(u(\textbf{x})\) to \(\textbf{g}\), conditioning \(u(\textbf{x})\) on \(\textbf{g}\) yields another Gaussian process distribution (Pförtner et al. 2023):

$$\begin{aligned} \begin{aligned} u(\textbf{x}) | \textbf{g}\sim \mathcal{G}\mathcal{P}\left( m^*(\textbf{x}), k^*(\textbf{x},\textbf{x}')\right) \end{aligned} \end{aligned}$$
(8)

Here, the posterior mean function \(m^*(\textbf{x})\) and covariance function \(k^*(\textbf{x},\textbf{x}')\) are given byFootnote 1:

$$\begin{aligned} \begin{aligned} m^*(\textbf{x})&= \varvec{\mathcal {L}}' \left[ k(\textbf{x}, \textbf{z}')\right] \textbf{L}^{-1} \textbf{g}\\ k^*(\textbf{x},\textbf{x}')&= k(\textbf{x}, \textbf{x}') - \varvec{\mathcal {L}}'\left[ k(\textbf{x}, \textbf{z}')\right] \textbf{L}^{-1} \varvec{\mathcal {L}}\left[ k(\textbf{z}, \textbf{x}')\right] \end{aligned} \end{aligned}$$
(9)

where \({\textbf {L}} = {\mathcal {L}}\left[ {\mathcal {L}}' \left[ k({\textbf {z}}, {\textbf {z}}') \right] \right] \) is the Gram matrix. The posterior mean function \(m^*(\textbf{x})\) provides a full-field estimate of the solution \(u(\textbf{x})\). The posterior covariance function \(k^*(\textbf{x}, \textbf{x}')\) indicates the uncertainty associated with this estimate due to the fact that it was obtained using only a finite set of shape functions. Since the finite discretization is the only source of uncertainty in our model, we can intuit some association between this posterior covariance and the finite element discretization error.

The formulation presented thus far can be contextualized in the method of weighted residuals framework presented in Pförtner et al. (2023). Specifically, our continuous formulation is equivalent to choosing the information operator \(\varvec{\mathcal {I}}\left[ u(\textbf{x})\right] = \left[ \begin{array}{llll} \mathcal {I}_{1}\left[ u(\textbf{x})\right]&\mathcal {I}_{2}\left[ u(\textbf{x})\right]&\dots&\mathcal {I}_{m}\left[ u(\textbf{x})\right] \end{array}\right] ^{T}\) in their framework to be given by:

$$\begin{aligned} \begin{aligned} \mathcal {I}_i\left[ u(\textbf{x})\right] = \int _\Omega \nabla u(\textbf{x}) \cdot \nabla \psi _i(\textbf{x}) \, \textrm{d}\textbf{x}- \int _\Omega f(\textbf{x}) \psi _i(\textbf{x}) \, \textrm{d}\textbf{x}\end{aligned} \end{aligned}$$
(10)

Unfortunately, the integrals that arise in the expressions for the posterior mean and covariance functions in Eq. (9) are generally intractable. For some arbitrary covariance function \(k(\textbf{x}, \textbf{x}')\), the integration over the shape functions \(\psi _i(\textbf{x})\) and \(\psi _j(\textbf{x}')\) cannot be performed without putting severe restrictions on which shape functions are permitted. This in turn puts severe constraints on the domain shape, which undercuts the core strength of the finite element method, namely its ability to solve partial differential equations on complicated domains. On the other hand, we can design the covariance function such that these integrals do become tractable, for example by following Bilionis (2016) and setting \(k(\textbf{x}, \textbf{x}') = G(\textbf{x}, \textbf{x}')\), or following Owhadi (2015) and setting \(k(\textbf{x}, \textbf{x}') = \int _\Omega \int _\Omega G(\textbf{x}, \textbf{z}) G(\textbf{x}', \textbf{z}') \delta (\textbf{z}- \textbf{z}') \, \textrm{d}\textbf{z}\, \textrm{d}\textbf{z}'\), where \(\delta (\textbf{x})\) is a Dirac delta function. However, in both of these expressions, the Green’s function \(G(\textbf{x}, \textbf{x}')\) associated with the operator \(-\Delta \) is required, which is generally not available for a given partial differential equation. Since our aim is to develop a general Bayesian framework for modeling finite element discretization error, a new approach is needed that does not impose restrictions on the choice of shape functions or require access to the Green’s function.

2.2 Discretized formulation

This motivates us to approximate \(u(\textbf{x})\) in the finite-dimensional space \(\mathcal {V}^h\) spanned by a second set of locally supported shape functions \(\{\phi _j(\textbf{x})\}_{j=1}^n\). This defines the trial function \(u^{h}(\textbf{x})\):

$$\begin{aligned} \begin{aligned} u(\textbf{x}) \approx u^{h}(\textbf{x}) = \sum _{j=1}^n u_j \phi _j(\textbf{x}){} & {} \text { with } \phi _j(\textbf{x}) \in \mathcal {V}^h\end{aligned} \end{aligned}$$
(11)

Note that this is not the same set of shape functions as the one used to define the force vector in Eq. (4). In fact, since our aim is to model the discretization error that arises by choosing \(v(\textbf{x}) \in \mathcal {W}^h\) rather than and \(v(\textbf{x}) \in \mathcal {V}\), it is important that the error associated with the projection of an arbitrary function \(w(\textbf{x}) \in \mathcal {V}\) onto \(\mathcal {V}^h\) is small compared to the error associated with its projection onto \(\mathcal {W}^h\). Loosely speaking, we assume that \(\mathcal {V}^h\) is sufficiently expressive to serve as a stand-in for \(\mathcal {V}\).

Substituting Eqs. (3) and (11) into Eq. (2) yields the matrix formulation of the problem:

$$\begin{aligned} \begin{aligned} \textbf{H}\textbf{u}&= \textbf{g}\end{aligned} \end{aligned}$$
(12)

The elements of the stiffness matrix \(\textbf{H}\) are given by:

$$\begin{aligned} \begin{aligned} H_{ij}&= \int _{\Omega } \nabla \psi _i(\textbf{x}) \cdot \nabla \phi _j(\textbf{x}) \, \textrm{d}\textbf{x}\\ \end{aligned} \end{aligned}$$
(13)

The assumption that \(\mathcal {V}^h\) is more expressive than \(\mathcal {W}^h\) implies that \(\textbf{u}\) will have a larger dimensionality than \(\textbf{g}\) and thus that \(\textbf{H}\) is a rectangular matrix and that Eq. (12) describes an underdetermined system. However, the fact that this system of equations has an infinite set of solutions need not pose a problem, due to the regularizing effect of the prior assumed over \(u(\textbf{x})\).

Since the solution field \(u(\textbf{x})\) has been reduced from the infinite-dimensional space \(\mathcal {V}\) to the finite-dimensional \(\mathcal {V}^h\), the distribution assumed over the solution in Eq. (7) needs to be reduced accordingly. Instead of an infinite-dimensional Gaussian process, we obtain a finite-dimensional zero-mean normal distribution with a positive definite covariance matrix \(\varvec{\Sigma }\):

$$\begin{aligned} \begin{aligned} \textbf{u}\sim \mathcal {N}\left( 0, \varvec{\Sigma }\right) \end{aligned} \end{aligned}$$
(14)

The joint distribution of \(\textbf{u}\) and \(\textbf{g}\) is now given by:

$$\begin{aligned} \begin{aligned} \begin{bmatrix} \textbf{g}\\ \textbf{u}\end{bmatrix} = \begin{bmatrix} \textbf{H}\textbf{u}\\ \textbf{u}\end{bmatrix} \sim \mathcal {N}\left( \textbf{0}, \begin{bmatrix} \textbf{H}\varvec{\Sigma }\textbf{H}^T &{} \textbf{H}\varvec{\Sigma }\\ \varvec{\Sigma }\textbf{H}^T &{} \varvec{\Sigma }\end{bmatrix} \right) \end{aligned} \end{aligned}$$
(15)

Conditioning \(\textbf{u}\) on \(\textbf{g}\) yields the following posterior distribution:

$$\begin{aligned} \begin{aligned} \textbf{u}| \textbf{g}\sim \mathcal {N}\left( \textbf{m}^*, \varvec{\Sigma }^*\right) \end{aligned} \end{aligned}$$
(16)

Here, the posterior mean vector \(\textbf{m}^*\) and covariance matrix \(\varvec{\Sigma }^*\) are given by:

$$\begin{aligned} \begin{aligned} \textbf{m}^*&= \varvec{\Sigma }\textbf{H}^T \left( \textbf{H}\varvec{\Sigma }\textbf{H}^T\right) ^{-1} \textbf{g}\\ \varvec{\Sigma }^*&= \varvec{\Sigma }- \varvec{\Sigma }\textbf{H}^T \left( \textbf{H}\varvec{\Sigma }\textbf{H}^T \right) ^{-1} \textbf{H}\varvec{\Sigma }\end{aligned} \end{aligned}$$
(17)

Similar to the continuous formulation presented in Sect. 2.1, \(\textbf{m}^*\) can be interpreted as providing an estimate of the solution \(u(\textbf{x})\) in the fine space \(\mathcal {V}^h\), while observing the right-hand side \(f(\textbf{x})\) only in the coarse space \(\mathcal {W}^h\). The posterior covariance matrix \(\varvec{\Sigma }^*\) then provides an indication of the uncertainty associated with this estimate due to the fact that only observations from the coarse mesh are used to obtain this estimate. Note that if the test and trial spaces are chosen to be the same (i.e. \(\mathcal {W}^h= \mathcal {V}^h\)), \(\varvec{\Sigma }^*\) reduces to a null matrix, reflecting the fact that there no longer exists a discretization error between \(\mathcal {V}^h\) and \(\mathcal {W}^h\).

2.3 Hierarchical shape functions

Thus far, the only requirement that has been put on the choice of \(\mathcal {V}^h\) and \(\mathcal {W}^h\) is that the error between \(\mathcal {V}\) and \(\mathcal {V}^h\) is small compared to the error between \(\mathcal {V}\) and \(\mathcal {W}^h\). We now add a second restriction, namely that \(\mathcal {W}^h\subset \mathcal {V}^h\). This defines a hierarchy between these two spaces, and implies that any function defined in \(\mathcal {W}^h\) can be expressed in \(\mathcal {V}^h\). One way to ensure this hierarchy in practice is to first define a coarse mesh corresponding to \(\mathcal {W}^h\), and then refine it hierarchically to obtain a fine mesh corresponding to \(\mathcal {V}^h\). Alternatively, it is possible to use only a single mesh, and use linear and quadratic shape functions over the same finite elements to define \(\mathcal {W}^h\) and \(\mathcal {V}^h\), respectively.

From the hierarchy between \(\mathcal {V}^h\) and \(\mathcal {W}^h\), it follows that the basis functions that span the coarse space \(\mathcal {W}^h\) can be written as a linear combination of the basis functions that span the fine space \(\mathcal {V}^h\). In other words, there exists a matrixFootnote 2\(\varvec{\Phi }^T\) that maps a vector of fine shape functions \(\varvec{\phi }(\textbf{x}) =\left[ \begin{array}{llll} \phi _1(\textbf{x})&\phi _2(\textbf{x})&\dots&\phi _n(\textbf{x}) \end{array}\right] ^T\) to a vector of coarse shape functions \(\varvec{\psi }(\textbf{x}) =\left[ \begin{array}{llll} \psi _1(\textbf{x})&\psi _2(\textbf{x})&\dots&\psi _m(\textbf{x}) \end{array}\right] ^T\):

$$\begin{aligned} \begin{aligned} \varvec{\psi }(\textbf{x}) = \varvec{\Phi }^T \varvec{\phi }(\textbf{x}) \end{aligned} \end{aligned}$$
(18)

This allows Eq. (13) to be rewritten as:

$$\begin{aligned} \begin{aligned} H_{ij}&= \int _{\Omega } \nabla \sum _{k=1}^n \Phi _{ki} \phi _k(\textbf{x}) \cdot \nabla \phi _j(\textbf{x}) \, \textrm{d}\textbf{x}\\&= \sum _{k=1}^n \Phi _{ki} \int _{\Omega } \nabla \phi _k(\textbf{x}) \cdot \nabla \phi _j(\textbf{x}) \, \textrm{d}\textbf{x}\end{aligned} \end{aligned}$$
(19)

As a result, \(\textbf{H}\) can be expressed as:

$$\begin{aligned} \begin{aligned} \textbf{H}= \varvec{\Phi }^T \textbf{K}\end{aligned} \end{aligned}$$
(20)

where \(\textbf{K}\) is the fine-scale (square and symmetric) stiffness matrix that would follow if both trial and test functions came from the fine space \(\mathcal {V}^h\):

$$\begin{aligned} \begin{aligned} K_{ij}&= \int _{\Omega } \nabla \phi _i(\textbf{x}) \cdot \nabla \phi _j(\textbf{x}) \, \textrm{d}\textbf{x}\\ \end{aligned} \end{aligned}$$
(21)

Following a similar line of reasoning, the coarse stiffness matrix \(\mathbf {K_c}\), that would be found if both trial and test functions came from the coarse space \(\mathcal {W}^h\), can be written in terms of \(\varvec{\Phi }\) and \(\textbf{K}\):

$$\begin{aligned} \begin{aligned} \mathbf {K_c}&= \varvec{\Phi }^T \textbf{K}\varvec{\Phi }\end{aligned} \end{aligned}$$
(22)

Similarly to Eq. (19), we can rewrite Eq. (4) as:

$$\begin{aligned} \begin{aligned} g_{i}&= \int _{\Omega } f(\textbf{x}) \sum _{k=1}^n \Phi _{ki} \phi _k(\textbf{x}) \, \textrm{d}\textbf{x}\\&= \sum _{k=1}^n \Phi _{ki} \int _{\Omega } f(\textbf{x}) \phi _k(\textbf{x}) \, \textrm{d}\textbf{x}\end{aligned} \end{aligned}$$
(23)

And so, \(\textbf{g}\) can be expressed as:

$$\begin{aligned} \begin{aligned} \textbf{g}= \varvec{\Phi }^T \textbf{f}\end{aligned} \end{aligned}$$
(24)

where \(\textbf{f}\) is the fine-scale force vector that arises by integrating the forcing term over the fine-scale test functions:

$$\begin{aligned} \begin{aligned} f_i&= \int _{\Omega } f(\textbf{x}) \phi _i(\textbf{x}) \, \textrm{d}\textbf{x}\end{aligned} \end{aligned}$$
(25)

Finally, we define the reference solution \(\varvec{\hat{\textbf{u}}}\) as the solution to the fine-scale system of equations that is obtained by choosing both the test and trial spaces to be the fine space \(\mathcal {V}^h\):

$$\begin{aligned} \begin{aligned} \textbf{K}\varvec{\hat{\textbf{u}}} = \textbf{f}\end{aligned} \end{aligned}$$
(26)

In the remainder of this work, discretization error is defined with respect to \(\varvec{\hat{\textbf{u}}}\). Specifically, the finite element discretization error \(\textbf{e}\) is defined as the difference between the fine-scale reference solution \(\varvec{\hat{\textbf{u}}}\) and the coarse-scale solution projected to the fine space:

$$\begin{aligned} \begin{aligned} \textbf{e}= \textbf{K}^{-1} \textbf{f}- \varvec{\Phi }\mathbf {K_c}^{-1} \textbf{g}\end{aligned} \end{aligned}$$
(27)

2.4 Boundary conditions

It is worth considering how the application of boundary conditions in the fine space translates to the shape functions in the coarse space. To do this, \(\varvec{\phi }(\textbf{x})\) is split into \({{\varvec{\phi }}_{\textbf{i}}}(\textbf{x})\) and \({\varvec{\phi }}_{\textbf{d}}(\textbf{x})\), where the subscript \({}_{\textbf{d}}\) refers to the nodes on the part of the boundary where Dirichlet conditions are applied, and the subscript \({{}_{\textbf{i}}}\) refers to all other nodes (i.e. both internal nodes and non-Dirichlet boundary nodes). This could be considered abuse of notation, since \(\mathcal {V}^h\subset \mathcal {V}\), which is already constrained by the Dirichlet boundary conditions, so from this point of view, \({\varvec{\phi }}_{\textbf{d}}(\textbf{x})\) should not exist. However, in most practical finite element implementations, shape functions are assigned to the boundary nodes as well in order to facilitate the inclusion of inhomogeneous boundary conditions in the model.

The boundary conditions in the coarse space follow from \({\varvec{\phi }}_{\textbf{d}}(\textbf{x})\) and \(\varvec{\Phi }\), since \({\varvec{\psi }}_{\textbf{d}}(\textbf{x})\) is defined as the elements of \(\varvec{\psi }(\textbf{x})\) where the rows of \(\varvec{\Phi }\) belonging to \({\varvec{\phi }}_{\textbf{d}}(\textbf{x})\) have non-zero entries. As a result, Eq. (18) can be split as follows:

$$\begin{aligned} \begin{aligned} \begin{bmatrix} {{\varvec{\psi }}_{\textbf{i}}}(\textbf{x}) \\ {\varvec{\psi }}_{\textbf{d}}(\textbf{x}) \end{bmatrix} = \begin{bmatrix} {\varvec{\Phi }}_{\textbf{ii}}^T &{} \textbf{0}\\ \varvec{\Phi }_\textbf{id}^T &{} \varvec{\Phi }_\textbf{dd}^T \end{bmatrix} \begin{bmatrix} {{\varvec{\phi }}_{\textbf{i}}}(\textbf{x}) \\ {\varvec{\phi }}_{\textbf{d}}(\textbf{x}) \end{bmatrix} \end{aligned} \end{aligned}$$
(28)

Note that the fact that \(\varvec{\Phi }_\textbf{di} = \textbf{0}\) does not introduce any loss of generality: any non-zero element of \(\varvec{\Phi }_\textbf{di}\) would by definition of \({\varvec{\psi }}_{\textbf{d}}(\textbf{x})\) be an element of \(\varvec{\Phi }_\textbf{dd}\), not \(\varvec{\Phi }_\textbf{di}\). From Eqs. (20) and (28), it follows that:

$$\begin{aligned} \begin{aligned} {\textbf{H}}_{\textbf{ii}}&= {\varvec{\Phi }}_{\textbf{ii}}^T {\textbf{K}}_{\textbf{ii}} \end{aligned} \end{aligned}$$
(29)

Similarly, from Eqs. (24) and (28), it follows that:

$$\begin{aligned} \begin{aligned} {{\textbf{g}}_{\textbf{i}}}&= {\varvec{\Phi }}_{\textbf{ii}}^T {{\textbf{f}}_{\textbf{i}}} \end{aligned} \end{aligned}$$
(30)

Commonly, Dirichlet boundary conditions are enforced by eliminating the corresponding degrees of freedom, and solving the system that remains. Due to the simple relation that \({\varvec{\Phi }}_{\textbf{ii}}\) provides between \({\textbf{H}}_{\textbf{ii}}\) and \({\textbf{K}}_{\textbf{ii}}\) (Eq. 29) as well as \({{\textbf{g}}_{\textbf{i}}}\) and \({{\textbf{f}}_{\textbf{i}}}\) (Eq. 30), all relationships described in Sects. 2.2 and 2.3 still hold when applied only to the internal nodes of the system. From this point onward, we will therefore only consider the internal nodes of the system. This also means that only the part of the covariance matrix related to the internal nodes \({\varvec{\Sigma }}_{\textbf{ii}}\) needs to be considered, and so the requirement of positive definiteness of \(\varvec{\Sigma }\) can be relaxed to a requirement of positive definiteness of only \({\varvec{\Sigma }}_{\textbf{ii}}\). The subscripts \({{}_{\textbf{i}}}\) (for vectors) and \({}_{\textbf{ii}}\) (for matrices) will be left implied in order to declutter the notation.

In the remainder of this paper, we will limit ourselves to partial differential equations with homogeneous boundary conditions. However, the method can easily be extended to inhomogeneous Dirichlet and Neumann boundary conditions. Details on how inhomogeneous boundary conditions can be enforced are given in Appendix A.

3 Choice of prior covariance

Thus far, the prior covariance matrix \(\varvec{\Sigma }\) has not been specified. The choice of \(\varvec{\Sigma }\) is subject to two main requirements. The first requirement is that \(\varvec{\Sigma }\) needs to have a sparse representation. Since \(\varvec{\Sigma }\) is a \(n \times n\) matrix, where n is the number of degrees of freedom of the fine discretization, explicitly computing, storing and applying operations on the full matrix would quickly become prohibitively expensive. As a result, the traditional approach of using a kernel to directly compute all entries of \(\varvec{\Sigma }\) would be infeasible. Instead, the prior is defined implicitly by assigning a sparse covariance matrix to the fine-scale force vector \(\textbf{f}\), which implicitly defines the covariance matrix of the solution vector \(\varvec{\Sigma }\), but does not require us to explicitly compute it. For certain kernel-based priors, an equivalent stochastic partial differential equation can be shown to exist, which allows for a similar sparse representation (see for example Roininen et al. (2014)).

The second requirement is that the choice of prior distribution needs to be appropriate for the partial differential equation at hand. For instance, if the infinitely differentiable squared exponential prior were assumed on the solution field u(x), this would imply \(C^\infty \) continuity on the right-hand side field f(x). From a modeling point of view, this would be an undesirable assumption to make, since it is very restrictive concerning what forcing terms are permitted. On the other hand, if the prior is not smooth enough, samples from the prior would exhibit unphysical discontinuities in u(x) or its gradient fields. In short, the prior needs to respect the smoothness of the partial differential equation to which it is applied.

In this section, a particular class of priors that meets both of these requirements is presented by means of two test cases. The first test case, presented in Fig. 1, concerns a one-dimensional mechanics problem described by the following ordinary differential equation with homogeneous boundary conditions:

$$\begin{aligned} \begin{aligned} -\frac{\textrm{d}}{\, \textrm{d}x}\left( EA(x) \frac{\textrm{d}u}{\, \textrm{d}x} \right)&= f(x){} & {} \text { in } \Omega = \left( 0, 1\right) \\ u(x)&= 0{} & {} \text { on } \partial \Omega = \{0,1\} \end{aligned} \end{aligned}$$
(31)

Here, the distributed load \(f(x) = 1\), Young’s modulus \(E = 1\) and the cross-sectional area \(A(x) = 0.1 - 0.099 x\). This setup describes a tapered bar with a constant load, where both the left and right end are clamped, as shown in Fig. 1. The fine-scale discretization consists of a uniform mesh with 64 elements (\(n = 64\)) and linear shape functions. Three different levels of uniform coarse discretization are used: \(m = 4\), \(m = 16\) and \(m = 64\). Note that in all cases, since n is a multiple of m, the shape functions are defined hierarchically in accordance with Sect. 2.3.

The second case is shown in Fig. 2 and concerns a two-dimensional mechanics problem. A plate (\(L=4\), \(H=2\)) with a hole (\(R = 0.8\)) is clamped on its left edge and loaded by a constant horizontal body load \(f_x = 1\) The plate has unit thickness, Young’s modulus \(E = 3\) and Poisson’s ratio \(\nu = 0.2\). The problem is meshed non-uniformly, as shown in Fig. 2a. For the coarse mesh, a characteristic length \(h=0.5\) is used at the left and right edge, but around the hole a refinement is applied. The refinements below and above the hole have a characteristic length of \(h=0.2\) and \(h=0.05\), respectively. The fine mesh is generated by dividing each coarse element into 4 smaller triangular elements. In Fig. 2b, it can be seen how this difference in mesh density on different sides of the hole results in a larger discretization error below the hole than above it. For reference, the fine-scale and coarse-scale solution are shown in Figs. 2c and d, respectively.

Fig. 1
figure 1

Schematic overview of the tapered bar test case

Fig. 2
figure 2

Overview of the perforated plate bar test case

3.1 A sparse right-hand side prior

Following the approach taken in Cockayne et al. (2017), rather than assuming a prior measure directly on the displacement field \(u(\textbf{x})\), we assume a centered Gaussian process prior with covariance function \(k_\text {f}(\textbf{x}, \textbf{x}')\) over the forcing term \(f(\textbf{x})\):

$$\begin{aligned} \begin{aligned} f(\textbf{x}) \sim \mathcal{G}\mathcal{P}\left( 0, k_\text {f}(\textbf{x}, \textbf{x}')\right) \end{aligned} \end{aligned}$$
(32)

This implicitly defines an equivalent prior on \(u(\textbf{x})\):

$$\begin{aligned} \begin{aligned} u(\textbf{x}) \sim \mathcal{G}\mathcal{P}\left( 0, k_\text {nat}(\textbf{x}, \textbf{x}')\right) \end{aligned} \end{aligned}$$
(33)

Here, the covariance function \(k_\text {nat}\) can be expressed in terms of \(k_\text {f}(\textbf{x}, \textbf{x}')\) and the Green’s function \(G(\textbf{x}, \textbf{x}')\) associated with the operator of the partial differential equation:

$$\begin{aligned} \begin{aligned} k_\text {nat}(\textbf{x}, \textbf{x}') = \int _{\Omega } \int _{\Omega } G(\textbf{x}, \textbf{z}) G(\textbf{x}', \textbf{z}') k_\text {f}(\textbf{z}, \textbf{z}') \, \textrm{d}\textbf{z}\, \textrm{d}\textbf{z}' \end{aligned} \end{aligned}$$
(34)

In Cockayne et al. (2017), this kernel is described as “natural” in the sense that the operator \(-\Delta \) (see Eq. 1) uniquely maps from the Hilbert space associated with the forcing term covariance function \(k_f(\textbf{x}, \textbf{x}')\) to the one associated with \(k_\text {nat}(\textbf{x}, \textbf{x}')\). Each sample from \(u(\textbf{x})\) drawn from this natural kernel has an equivalent sample from \(f(\textbf{x})\) and vice versa. Unfortunately, since the Green’s function is generally not available for a given partial differential equation, Cockayne et al. (2017) discards this natural kernel is then discarded in favor of a Matern or Wendland kernel with the appropriate level of smoothness.

However, because we avoid this problem by introducing the fine-scale discretization, there is no need here to step away from the natural prior approach. Instead, it can be approximated by applying the fine-scale finite element discretization first, and only then finding the natural covariance matrix for the solution vector \(\textbf{u}\). Given the prior distribution over \(f(\textbf{x})\) in Eq. (32) and the definition of the force vector in Eq. (21), it follows that:

$$\begin{aligned} \begin{aligned} \textbf{f}\sim \mathcal {N}\left( \textbf{0}, \varvec{\Sigma }_\textbf{f}\right) \end{aligned} \end{aligned}$$
(35)

where the force vector covariance matrix \(\varvec{\Sigma }_\textbf{f}\) is given by:

$$\begin{aligned} \begin{aligned} \varvec{\Sigma }_\textbf{f}= \int _{\Omega } \int _{\Omega } k_\text {f}(\textbf{x}, \textbf{x}') \varvec{\phi }(\textbf{x}) \varvec{\phi }(\textbf{x}')^T \, \textrm{d}\textbf{x}' \, \textrm{d}\textbf{x}\end{aligned} \end{aligned}$$
(36)

The resulting prior distribution over \(\textbf{u}\) then becomes:

$$\begin{aligned} \begin{aligned} \textbf{u}\sim \mathcal {N}\left( \textbf{0}, \textbf{K}^{-1} \varvec{\Sigma }_\textbf{f}\textbf{K}^{-1}\right) \end{aligned} \end{aligned}$$
(37)

Note the similarity to the natural kernel in Eq. (34), with \(\textbf{K}^{-1}\) and \(\varvec{\Sigma }_\textbf{f}\) taking a similar role as \(G(\textbf{x}, \textbf{x}')\) and \(k_\text {f}(\textbf{x}- \textbf{x}')\), respectively (Peker 2023). Also similarly, each sample from \(\textbf{u}\) has an equivalent sample from \(\textbf{f}\) and vice versa. Conceptually, our choice of prior is the same as Cockayne et al. (2017), except that we are working in the finite-dimensional space of the discretized system, rather than the infinite-dimensional space of the original partial differential equation. The advantage of working in the finite-dimensional space is that \(\textbf{K}^{-1}\) is computable, and as a result the natural prior can still be used.

Given this choice of prior and using Eq. (20), the posterior distribution of the displacement field is given by:

$$\begin{aligned} \begin{aligned} \textbf{u}| \textbf{g}\sim \mathcal {N}\left( \textbf{m}^*, \varvec{\Sigma }^*\right) \end{aligned} \end{aligned}$$
(38)

with the following posterior mean \(\textbf{m}^*\) and posterior covariance \(\varvec{\Sigma }^*\):

$$\begin{aligned} \begin{aligned} \textbf{m}^*&= \textbf{K}^{-1} \varvec{\Sigma }_\textbf{f}\varvec{\Phi }\left( \varvec{\Phi }^T \varvec{\Sigma }_\textbf{f}\varvec{\Phi }\right) ^{-1} \varvec{\Phi }^T \textbf{f}\\ \varvec{\Sigma }^*&= \textbf{K}^{-1} \left( \textbf{I}- \varvec{\Sigma }_\textbf{f}\varvec{\Phi }\left( \varvec{\Phi }^T \varvec{\Sigma }_\textbf{f}\varvec{\Phi }\right) ^{-1} \varvec{\Phi }^T \right) \varvec{\Sigma }_\textbf{f}\textbf{K}^{-1} \end{aligned} \end{aligned}$$
(39)

The presence of \(\textbf{K}^{-1}\) in Eq. (39) might appear in conflict with our previously stated requirement of sparsity in the covariance matrices, since the inverse of \(\textbf{K}\) is typically full. However, using an ensemble, the prior and posterior distributions can be approximated and sampled without needing to explicitly compute this matrix inverse; only fine-scale linear solves are necessary. The details on this ensemble approximation can be found in Appendix B.

Naturally, the need for fine-scale linear solves makes the computational cost of the proposed method on par with obtaining the fine-scale finite element solution, rather than that of the coarse-scale solve as one might hope. Although acceleration of the method falls beyond the scope of this paper, we do want to highlight two potential strategies to alleviate the computational cost. The first is to employ Langevin dynamics–based sampling schemes similar to Akyildiz et al. (2021), which relies on \(\varvec{\Sigma }^{-1}\) rather than \(\varvec{\Sigma }\) to sample the posterior. A second potential approach is to approximate the posterior using a finite number of conjugate gradient iterations. In Wenger et al. (2023), an approach is presented to acccount for the error this introduces in a consistent Bayesian manner with little additional computation cost.

3.2 White noise prior

Within the natural prior framework, the main choice that remains is what right-hand side covariance function \(k_\text {f}(\textbf{x}, \textbf{x}')\) to assume. For now, we will follow the choice of Cockayne et al. (2017) to use the prior from Owhadi (2015), and assume \(k_\text {f}(\textbf{x}, \textbf{x}')\) to be a Dirac delta function \(\delta (\textbf{x})\), scaled by a single hyperparameter \(\alpha \):

$$\begin{aligned} \begin{aligned} k_\text {f}(\textbf{x}, \textbf{x}') = \alpha ^2 \delta (\textbf{x}- \textbf{x}') \end{aligned} \end{aligned}$$
(40)

This defines a white noise field over \(f(\textbf{x})\) with a standard deviation that is equal to \(\alpha \). The covariance matrices \(\varvec{\Sigma }_\textbf{f}\) and \(\varvec{\Sigma }\) then follow directly from Eqs. (36) and (37):

$$\begin{aligned} \begin{aligned}&\varvec{\Sigma }_\textbf{f}= \alpha ^2 \textbf{M}\quad \varvec{\Sigma }= \alpha ^2 \textbf{K}^{-1} \textbf{M}\textbf{K}^{-1} \end{aligned} \end{aligned}$$
(41)

where \(\textbf{M}\) is the fine-scale (square and symmetric) mass matrix, given by:

$$\begin{aligned} \begin{aligned} M_{ij}&= \int _{\Omega } \phi _i(\textbf{x}) \phi _j(\textbf{x}) \, \textrm{d}\textbf{x}\end{aligned} \end{aligned}$$
(42)

Note that under this choice of prior covariance, the sparsity requirement that was put on \(\varvec{\Sigma }\) has been met. The resulting posterior mean vector and covariance matrix are then given by:

$$\begin{aligned} \begin{aligned} \textbf{m}^*&= \textbf{K}^{-1} \textbf{M}\varvec{\Phi }\left( \varvec{\Phi }^T \textbf{M}\varvec{\Phi }\right) ^{-1} \varvec{\Phi }^T \textbf{f}\\ \varvec{\Sigma }^*&= \alpha ^2 \textbf{K}^{-1} \left( \textbf{I}- \textbf{M}\varvec{\Phi }\left( \varvec{\Phi }^T \textbf{M}\varvec{\Phi }\right) ^{-1} \varvec{\Phi }^T \right) \textbf{M}\textbf{K}^{-1} \end{aligned} \end{aligned}$$
(43)

It can be seen that for this choice of prior, the hyperparameter \(\alpha \) does not affect the posterior mean, and only serves as a scaling factor of the posterior covariance. Given this hyperparameter-independence, we choose to simply set \(\alpha = 1\) for the remainder of this work. A small observation noise (\(\sigma _e^2 = 10^{-12}\)) is added to the term \(\varvec{\Phi }^T \textbf{M}\varvec{\Phi }\) in Eq. (43), to ensure that this matrix is invertible.

In Fig. 3a, the resulting prior and posterior distributions for the tapered bar problem are shown for the number of coarse elements m equal to 4. Several pieces of information about the problem, in absence of knowledge of the right-hand side term, can be found encoded in the prior. We can see how the enforcement of boundary conditions described in Fig. 2.4 indeed results in a distribution whose samples respect the boundary conditions imposed at \(x=0\) and \(x=1\). Furthermore, a larger prior standard deviation is found in the region where the bar is thinner, reflecting the fact that a small perturbation in the right-hand side in this region would have a more pronounced effect on the displacement field. Considering the posterior distribution, we see that its mean falls between the coarse- and fine-scale reference solutions. Lastly, it can be seen that the region where the posterior standard deviation is largest corresponds to the region where the discretization error is largest.

In Figs. 3b and d, we have increased the number of degrees of freedom of the coarse mesh m to 16 and 64 respectively, to study its effect on the posterior distribution. As the coarse-scale solution approaches the fine-scale solution, the posterior mean approaches the fine-scale solution accordingly. Additionally, the posterior standard deviation shrinks along with the discretization error until the coarse mesh density meets the fine one at \(m = n = 64\). At this point, only a small posterior standard deviation remains due to the small observation noise that was included in the model.

Fig. 3
figure 3

Prior and posterior distributions of the 1D tapered bar problem. For comparison, the fine-scale and coarse-scale reference solutions have been included. From each distribution, 30 samples have been plotted. The shaded regions correspond to the 95% credible intervals of the distributions

In Fig. 4, the posterior moments are plotted when the same prior distribution is applied to the two-dimensional perforated plate problem. We find that the results for this two-dimensional test case are quite different from those for the previous one-dimensional case. It can be observed in Fig. 4a that the posterior mean almost exactly matches the fine-scale solution shown in Fig. 2c. An explanation for this can be found by considering Eq. (39) and noting that the posterior mean \(\textbf{m}^*\) is equivalent to the reference solution \(\varvec{\hat{\textbf{u}}}\), except that the force vector \(\textbf{f}\) has been replaced by \(\varvec{\hat{\textbf{f}}}\), a weighted projection of \(\textbf{f}\) onto the column space of \(\varvec{\Phi }\):

$$\begin{aligned} \begin{aligned} \varvec{\hat{\textbf{f}}} = \textbf{P}\textbf{f}= \varvec{\Sigma }_\textbf{f}\varvec{\Phi }\left( \varvec{\Phi }^T \varvec{\Sigma }_\textbf{f}\varvec{\Phi }\right) ^{-1} \varvec{\Phi }^T \textbf{f}\end{aligned} \end{aligned}$$
(44)

In other words, \(\textbf{f}\) is mapped to the coarse space, scaled, mapped back to the fine space and rescaled to obtain \(\varvec{\hat{\textbf{f}}}\). The quality of this projection depends on the weights given by \(\varvec{\Sigma }_\textbf{f}\), and there exists a sense in which the choice of \(\varvec{\Sigma }_\textbf{f}= \textbf{M}\) is optimal: it minimizes the projection error of the forcing term \(f(\textbf{x})\) to the coarse space \(\mathcal {W}^h\) in the \(L^2\)-norm ( Larson and Bengzon (2013), Theorem 1.1):

$$\begin{aligned} \mathop {\mathrm {arg\,min}}\limits _{f^{\text {h}}(\textbf{x}) \in \mathcal {W}^h} \Vert f(\textbf{x}) - f^{\text {h}}(\textbf{x}) \Vert _{L^2(\Omega )}^2 = \varvec{\psi }(\textbf{x})^T \left( \varvec{\Phi }^T \textbf{M}\varvec{\Phi }\right) ^{-1} \varvec{\Phi }^T \textbf{f}\end{aligned}$$
(45)

This optimality helps explain the close correspondence between the fine-scale solution and posterior mean given in Figs. 2c and 4a, respectively.

Fig. 4
figure 4

Posterior moments of the perforated plate test case with \(\varvec{\Sigma }_\textbf{f}= \textbf{M}\)

Though this might appear to be a desirable property, for the purposes of modeling discretization error, it is actually detrimental. To understand this, let us consider the following equality:

$$\begin{aligned} \begin{aligned} \varvec{\Sigma }^* \varvec{\Sigma }^{-1} \varvec{\hat{\textbf{u}}} = \varvec{\hat{\textbf{u}}} - \textbf{m}^* \end{aligned} \end{aligned}$$
(46)

This expression can be easily verified by substituting the expressions for \(\varvec{\hat{\textbf{u}}}\), \(\varvec{\Sigma }\), \(\textbf{m}^*\) and \(\varvec{\Sigma }^*\) found in Eqs. 39to 26. The left-hand side of Eq. (46) can be understood as quantifier of the amount of “contraction” of the prior distribution due to the observed data. In the extreme case where there is no contraction of the covariance, we find that the posterior covariance matrix \(\varvec{\Sigma }^*\) is equal to the prior covariance matrix \(\varvec{\Sigma }\), and consequently \(\varvec{\Sigma }^* \varvec{\Sigma }^{-1} = \textbf{I}\) and \(\textbf{m}^* = \textbf{0}\). At the other extreme, where the posterior covariance is given by \(\varvec{\Sigma }^* = \epsilon \textbf{I}\) and we let \(\epsilon \rightarrow 0\), we find that the left-hand side approaches the null vector and as a result \(\textbf{m}^* \rightarrow \varvec{\hat{u}}\). As more observations are included, the posterior distribution moves from the former extreme to the latter.

It becomes clear that the posterior mean vector \(\textbf{m}^*\) and posterior covariance matrix \(\varvec{\Sigma }^*\) are inextricably linked. This property is not necessarily a problematic one. In fact, from the typical probabilistic numerics point of view, where the solving procedure is interpreted as an inherently probabilistic process (Hennig et al. 2022), the fact that the posterior covariance tends to zero as the posterior mean approaches the true solution is the desired kind of behavior. However, if our goal is to have the discretization error reflected in the posterior covariance, then this connection to the posterior mean does pose a problem: it is not possible to simultaneously obtain a posterior mean that approaches the true solution and a posterior covariance that is indicative of the coarse-scale discretization error. And indeed, when comparing the posterior standard deviation \(\varvec{\sigma }^*\) in Fig. 4b to the discretization error \(\textbf{e}\) in Fig. 2b, we see that the regions of largest discretization error are not reflected in the posterior standard deviation.

3.3 Green’s function prior

This crucial observation motivates us to reevaluate our initial choice of prior. Given how Eq. (46) relates the posterior covariance matrix \(\varvec{\Sigma }^*\) to the difference between the reference solution \(\varvec{\hat{\textbf{u}}}\) and the posterior mean vector \(\textbf{m}^*\), it makes sense to choose a prior that will yield a posterior mean equal to the coarse-scale solution \(\mathbf {u_c}\). Additionally, from a discretization error modeling point of view, it is more sensible to have a posterior mean that is equal to the coarse-scale solution \(\mathbf {u_c}\) than to have a posterior mean that improves on it. After all, the aim from the outset has been to interpret the finite element discretization error as a source of uncertainty surrounding the coarse-scale finite element solve.

In Pförtner et al. (2023), a method is presented to construct a prior whose posterior mean matches exactly the coarse-scale finite element solution \(\mathbf {u_c}\) from an initial prior an arbitrary mean function \(m(\textbf{x})\) and covariance function \(k(\textbf{x}, \textbf{x}')\). However, we will opt instead for the method presented in Bilionis (2016), which is to set the prior covariance function equal to the Green’s function \(G(\textbf{x}, \textbf{x}')\) of the partial differential equation at hand. For Poisson’s equation, this choice of prior yields the following right-hand side covariance function \(k_\text {f}(\textbf{x}, \textbf{x}')\):

$$\begin{aligned} \begin{aligned} k_\text {f}(\textbf{x}, \textbf{x}') = - \Delta \delta (\textbf{x}- \textbf{x}') \end{aligned} \end{aligned}$$
(47)

Substitution of this expression into Eq. (36), applying integration by parts and subsequent substitution into Eq. (37) yields the following expressions for \(\varvec{\Sigma }_\textbf{f}\) and \(\varvec{\Sigma }\):

$$\begin{aligned} \begin{aligned}&\varvec{\Sigma }_\textbf{f}= \textbf{K}\quad \varvec{\Sigma }= \textbf{K}^{-1} \end{aligned} \end{aligned}$$
(48)

Intuitively, we again find \(\textbf{K}^{-1}\) as the finite-dimensional counterpart of \(G(\textbf{x}, \textbf{x}')\). The advantages of introducing the fine-scale discretization as a stand-in for the infinite-dimensional partial differential equation once again become apparent: the fact that Green’s function is generally unavailable does not pose a problem anymore. Furthermore, the objection raised in Bilionis (2016) that for Poisson’s equation in two or three dimensions, the Green’s function \(G(\textbf{x}, \textbf{x}')\) is infinite at \(\textbf{x}= \textbf{x}'\) and can therefore not be a useful indicator of model uncertainty does not apply in our case: for any valid finite element discretization the finite-dimensional inverse stiffness matrix \(\textbf{K}^{-1}\) only has finite-valued entries. In the phrasing of Alberts and Bilionis (2023), the introduction of the fine-scale discretization offers a way to truncate the integration over functions at the smallest scales.

This choice of prior in turn results in the following posterior mean vector and covariance matrix:

$$\begin{aligned} \begin{aligned} \textbf{m}^*&= \varvec{\Phi }\left( \varvec{\Phi }^T \textbf{K}\varvec{\Phi }\right) ^{-1} \varvec{\Phi }^T \textbf{f}\\ \varvec{\Sigma }^*&= \textbf{K}^{-1} - \varvec{\Phi }\left( \varvec{\Phi }^T \textbf{K}\varvec{\Phi }\right) ^{-1} \varvec{\Phi }^T \end{aligned} \end{aligned}$$
(49)

Note that according to Eqs. (22) and (24), \(\varvec{\Phi }^T \textbf{K}\varvec{\Phi }\) and \(\varvec{\Phi }^T \textbf{f}\) are equal to the coarse stiffness matrix \(\mathbf {K_c}\) and coarse force vector \(\textbf{g}\), respectively. As a result, we find that indeed the posterior mean vector \(\textbf{m}^*\) is exactly equal to the solution of the coarse system \(\mathbf {u_c}\), projected to the fine space. Returning now to Eq. (46), we find that for this choice of prior, this expression simplifies to a surprisingly simple relationship between the posterior covariance matrix \(\varvec{\Sigma }^*\) and the discretization error \(\textbf{e}\) as defined in Eq. (27):

$$\begin{aligned} \begin{aligned} \varvec{\Sigma }^* \textbf{f}= \textbf{e}\end{aligned} \end{aligned}$$
(50)

Since this relation holds for any fine-scale force vector \(\textbf{f}\), and \(\varvec{\Sigma }^*\) is independent of \(\textbf{f}\), this posterior covariance matrix \(\varvec{\Sigma }^*\) can be used to determine the discretization error \(\textbf{e}\) for an arbitrary forcing term. In this sense, \(\varvec{\Sigma }^*\) can be said to fully encode the discretization error associated with the geometry and discretization of the problem at hand.

In Fig. 3d–f, the prior and posterior distributions that follow when applying this prior to the tapered bar problem are shown. Again, the number of coarse elements m is equal to 4, 16 and 64, respectively. As expected, the posterior mean \(\textbf{m}^*\) can be seen to equal the coarse-scale solution \(\mathbf {u_c}\) in these figures. Similar to Fig. 3a to 3c, the largest posterior standard deviation \(\varvec{\sigma }^*\) is found in the region where the bar is thinnest. However, for this prior there is a notable reduction of the posterior standard deviation around the coarse-scale nodes. This reduction in the standard deviation is reflective of the fact that at these nodes, the coarse solution is more accurate than in the regions between the coarse-scale nodes, where the solution is interpolated via the coarse-scale shape functions.

Another notable difference between the two priors in Fig. 3 is the smoothness of the samples. We see in Fig. 3a–c that the samples from the white noise prior presented in Fig. 3.2 have a visible smoothness to them. In contrast, the samples from the Green’s function prior shown in Fig. 3d–f appear jagged and rough. In fact, for this one-dimensional Poisson problem, each sample \(\tilde{u}(x)\) drawn from the Green’s function prior \(k(x, x') = G(x, x')\) can be shown to be continuous, but nowhere differentiable. This is the result of the fact that at \(x = x'\), the Green’s function is continuous (i.e. \(\lim _{\delta \rightarrow 0} G(x-\delta , x) = \lim _{\delta \rightarrow 0} G(x+\delta , x)\)), but at that same point its derivative is discontinuous (i.e. \(\lim _{\delta \rightarrow 0} G'(x - \delta , x) \ne \lim _{\delta \rightarrow 0} G'(x + \delta , x)\)) (Bayin 2006). The samples of a Gaussian process are mean-square continuous if \(k(\textbf{x}, \textbf{x}')\) is continuous at \(\textbf{x}= \textbf{x}'\) and are k times mean-square differentiable if \(k(\textbf{x}, \textbf{x}')\) is 2k times differentiable at \(\textbf{x}= \textbf{x}'\) (Rasmussen and Williams 2005). From the fact that the Green’s function is not differentiable at \(x = x'\), it thus follows that the samples drawn from this process are everywhere continuous but nowhere differentiable. Note that this only applies to the infinite-dimensional solution space \(\mathcal {V}\). The finite-dimensional space \(\mathcal {V}^h\) spanned by the fine-scale shape functions \(\varvec{\phi }(x)\) is still weakly once-differentiable for both priors.

We now turn to the perforated plate example, for which the results are shown in Fig. 5. The posterior mean \(\textbf{m}^*\) in Fig. 5a can be seen to exactly match the coarse-scale finite element solution \(\mathbf {u_c}\) in Fig. 2d for this problem as well. Unfortunately, the posterior covariance \(\varvec{\sigma }^*\) shown in Fig. 5b appears again to bear little resemblance to the discretization error \(\textbf{e}\) from Fig. 2d. This might seem surprising, given the direct relationship between posterior covariance \(\varvec{\Sigma }^*\) and discretization error \(\textbf{e}\) given in Eq. (50). Indeed, we can multiply the posterior covariance \(\varvec{\Sigma }^*\) by the fine-scale force vector \(\textbf{f}\) to recover the discretization error exactly (see Fig. 5c), but this does not translate to a posterior standard deviation \(\varvec{\sigma }^*\) that can be interpreted directly. This is a consequence of the fact that the posterior covariance \(\varvec{\Sigma }^*\) depends only on the material stiffness (via \(\textbf{K}\)) and node locations (via \(\varvec{\Phi }\)), but not on the magnitude of the force vector \(\textbf{f}\) at those locations. One benefit that results from this independence is that given the posterior covariance matrix \(\varvec{\Sigma }^*\) from one load case, it is possible to compute the discretization error for any other load case virtually for free.Footnote 3 However, the drawback of this independence is that, since the discretization error \(\textbf{e}\) does depend on the load applied to the structure, a load-independent posterior standard deviation \(\varvec{\sigma }^*\) cannot adequately represent the discretization error for any specific load case. Paradoxically, because the posterior covariance matrix \(\varvec{\Sigma }^*\) encodes the discretization error \(\textbf{e}\) for all load cases simultaneously, it fails to represent the discretization error for any one load case in particular. This paradox is not unique to our Bayesian formulation of the finite element method, and arises in many Gaussian process–based probabilistic solver of differential equations, including meshfree probabilistic solvers (Bilionis 2016; Cockayne et al. 2017) and probabilistic methods of weighted residuals (Pförtner et al. 2023). In all these cases, the error between the posterior mean function \(m^*(\textbf{x})\) and exact solution \(u(\textbf{x})\) is dependent on the right-hand side term, but the posterior covariance function \(k^*(\textbf{x}, \textbf{x}')\) meant to represent this error is not.

Fig. 5
figure 5

Posterior moments of the perforated plate test case with \(\varvec{\Sigma }_\textbf{f}= \textbf{K}\)

3.4 Incorporating force term information

This raises the question whether it is possible to break this independence of the posterior covariance matrix \(\varvec{\Sigma }^*\) and the fine-scale force vector \(\textbf{f}\). Doing so appears to be necessary to capture the load-dependent discretization error \(\textbf{e}\) in the posterior standard deviation \(\varvec{\sigma }^*\). Returning to Eq. (50), we can understand the multiplication of \(\varvec{\Sigma }^*\) by \(\textbf{f}\) through the eigendecomposition of \(\varvec{\Sigma }^*\):

$$\begin{aligned} \begin{aligned} \varvec{\Sigma }^* = \textbf{Q}\varvec{\Lambda }\textbf{Q}^{-1} \end{aligned} \end{aligned}$$
(51)

Here the columns of \(\textbf{Q}\) are the eigenvectors of \(\varvec{\Sigma }^*\) and \(\varvec{\Lambda }\) is a diagonal matrix whose entries are its eigenvalues in descending order. Since \(\varvec{\Sigma }^*\) is real positive definite, its eigenvalues are all positive real numbers, and \(\textbf{Q}\) is an orthogonal matrix, which implies that \(\textbf{Q}^{-1} = \textbf{Q}^T\).

The decomposition in Eq. (51) allows for a straightforward interpretation of the multiplication of \(\varvec{\Sigma }^*\) by \(\textbf{f}\). First, \(\textbf{Q}^{-1}\) performs a change of basis \(\varvec{\tilde{\textbf{f}}} = \textbf{Q}^{-1} \textbf{f}\), expressing \(\textbf{f}\) in terms of the basis spanned by the eigenvectors instead of the standard basis. In this basis, \(\varvec{\tilde{\textbf{f}}}\) is rescaled by the eigenvalues \(\varvec{\Lambda }\) to obtain the discretization error \(\varvec{\tilde{\textbf{e}}}\) expressed in terms of the eigenbasis. Finally, \(\textbf{Q}\) performs a change of basis on \(\varvec{\tilde{\textbf{e}}}\) back to the standard basis \(\textbf{e}= \textbf{Q}\varvec{\tilde{\textbf{e}}}\). Since \(\varvec{\Lambda }\) is a diagonal matrix, the operation \(\varvec{\tilde{\textbf{e}}} = \varvec{\Lambda }\varvec{\tilde{\textbf{f}}}\) comes down to a simple element-wise multiplication:

$$\begin{aligned} \begin{aligned} \tilde{e}_i = \lambda _i \tilde{f}_i \end{aligned} \end{aligned}$$
(52)

Rather than interpreting Eq. (52) as a rescaling of each element of the force vector \(\tilde{f}_i\) by its corresponding eigenvalue \(\lambda _i\), one could argue equally well that it is the eigenvalue \(\lambda _i\) that is rescaled by \(\tilde{f}_i\) instead. In order to break the independence of the posterior covariance matrix \(\varvec{\Sigma }^*\) and force vector \(\textbf{f}\), we replace the original eigenvalues \(\lambda _i\) with ones that are rescaled by \(\tilde{f}_i\) Thus, \(\varvec{\Lambda }\) is replaced by a diagonal matrix \(\textbf{E}\), whose diagonal entries are given by \(|\tilde{e}_i|\), yielding a new covariance matrix \(\varvec{\hat{\varvec{\Sigma }}}^*\):

$$\begin{aligned} \begin{aligned} \varvec{\hat{\varvec{\Sigma }}}^* = \textbf{Q}\textbf{E}\textbf{Q}^{-1} \end{aligned} \end{aligned}$$
(53)

Since all entries of \(\textbf{E}\) are nonnegative, this rescaled covariance matrix \(\varvec{\hat{\varvec{\Sigma }}}^*\) is positive semi-definite, and thus a valid covariance matrix. In Fig. 5d, the standard deviation \(\varvec{\hat{\varvec{\sigma }}}^*\) of this rescaled covariance matrix is shown. Comparing to the discretization error \(\textbf{e}\) in Fig. 2b, we see a clear similarity between these two fields. At last, we appear to have arrived at a distribution with a covariance matrix that can meaningfully capture the discretization error.

One shortcoming of this ad hoc approach to incorporating forcing term information in our posterior distribution, is that it is a deviation from the Bayesian paradigm used thus far, since there is no guarantee that there exists an equivalent prior distribution that would yield this rescaled posterior covariance matrix. Additionally, if there does exist an equivalent prior, it is unclear what posterior mean this equivalent prior would produce. Our motivation for presenting this approach nonetheless is to demonstrate not only that it is impossible to obtain an interpretable posterior standard deviation \(\varvec{\sigma }^*\) if the posterior covariance matrix \(\varvec{\Sigma }^*\) is independent of the forcing term \(\textbf{f}\), but also that it is possible to obtain an interpretable standard deviation by incorporating forcing term information.

4 Conclusions

In this work, we presented a Bayesian approach to the modeling of finite element discretization error. A Gaussian process prior is assumed over the solution space, which is conditioned on the force vector from a finite element discretization. To avoid the computation of intractable integrals, a second, finer mesh is introduced, which is assumed to be sufficiently fine to represent the true solution. The two meshes are constructed in a hierarchical manner, such that the coarse-scale shape functions can be fully expressed in terms of fine-scale shape functions. The Gaussian process prior on the solution space yields a normal distribution prior on the fine-scale solution vector. For linear partial differential equations, conditioning this prior on the coarse-scale force vector produces a normally distributed posterior on the solution vector.

Two different prior covariance functions have been investigated: a white noise prior covariance on the forcing term, and a Green’s function prior covariance on the solution term. The white noise prior covariance is shown to produce a posterior mean vector that is close to the fine-scale reference solution. However, an undesirable consequence of this property is that the corresponding posterior covariance matrix becomes less informative of the discretization error between the coarse-scale and fine-scale solutions. The Green’s function prior, on the other hand, can be shown to produce exactly the coarse-scale solution as its posterior mean. Additionally, the discretization error can be recovered exactly from the posterior covariance matrix by multiplying it by the fine-scale force vector. Because the posterior covariance matrix does not depend on the values of the forcing term, it can be multiplied by any arbitrary forcing term to reproduce exactly the discretization error for that forcing term. The drawback of this independence, however, is that by itself, a force-independent posterior covariance matrix cannot be informative of the force-dependent discretization error. We have shown how by rescaling the eigenvalues of the posterior covariance matrix based on the fine-scale force vector, a distribution can be obtained whose standard deviation corresponds to the discretization error.

One major drawback of the proposed method, as is the case for many probabilistic numerical methods is its computational cost, since it relies on fine-scale solves to sample from the posterior. Although several potential approaches to approximate or circumvent these fine-scale solves have been identified, these ideas still need to be put into practice in future work. Furthermore, the formulation in this work has assumed linearity on the partial differential equations and Gaussianity on the prior distribution. Extensions of the method beyond these assumptions are not trivial. Finally, the underlying reason for the development of a Bayesian model for finite element discretization error is to allow for the consistent treatment of discretization error through computational pipelines. In this work, the focus has been on the forward problem, and the fundamentals of our Bayesian formulation of the finite element method. The demonstration of the method in an inverse modeling or data assimilation context has been left for future work.