1 Introduction

The problem of estimating model parameters of static and dynamical systems is encountered in many applications from earth sciences to engineering. In this work we focus on the parameter estimation of dynamical systems described by parameterized parabolic partial differential equations (pPDEs). Here, we assume that a limited and polluted knowledge of the solution is available at multiple time instances through noisy local measurements.

For solving this kind of inverse problem, countless deterministic and stochastic methods have been proposed. Among them, a widely used technique is the so-called ensemble Kalman filter (Evensen 2003), a recursive filter employing a series of measurements to obtain improved estimates of the variables involved in the process. The idea of using the EnKF for reconstructing the parameters of dynamical systems traces back to Anderson (2001) and Lorentzen et al. (2001), in which trivial artificial dynamics for the parameters was assumed to make the estimation possible. This was naturally accompanied by efforts for improving the performance of the method in terms of stability, by introducing covariance inflation (Hamill et al. 2001; Anderson and Anderson 1999) and localization (Hamill et al. 2001; Houtekamer and Mitchell 2001), and in terms of computational cost. Relevant to the latter have been the development of multi-level (Hoel et al. 2016) and multi-fidelity Popov et al. (2021) and Donoghue and Yano (2022) methods, the use of model order reduction (MOR) techniques with offline (Pagani et al. 2017; da Silva and Colonius 2018) and with on-the-fly (Donoghue and Yano 2022) training, as well as the introduction of further surrogate modeling techniques (Popov and Sandu 2022). The use of approximated models inevitably led to the study of the impact of model error on the EnKF (Mitchell et al. 2002; Mitchell and Carrassi 2015), alongside with other data assimilation methods (Calvetti et al. 2018; Huttunen and Kaipio 2007).

Although ensemble Kalman methods were originally meant for sequential data assimilation, i.e., for real-time applications, they proved to be reliable also for asynchronous data assimilation (Sakov et al. 2010). The first paper proposing to adapt the EnKF to a retrospective data analysis was (Skjervheim et al. 2007). For analysis, the data are employed all at once at the end of an assimilation window, which is in common with a series of methods, e.g., variational methods (Li and Navon 2001) such as 4D-VAR (Thepaut and Courtier 1991) and other smoothers (Anderson and Moore 1979). Compared to those approaches, the EnKF is particularly appealing since it does not require the computation of Fréchet derivatives, a major complication for data assimilation algorithms.

In Iglesias et al. (2013), Iglesias et al. introduced what they called the ensemble Kalman method, an EnKF-based asynchronous data assimilation algorithm. Depending on the design of the algorithm, this method has connections to Bayesian data assimilation (Schillings and Stuart 2018) and to maximum likelihood estimation (Chen and Oliver 2012). In particular, in the latter case, the method constitutes an ensemble-based implementation of so-called iterative regularization methods (Kaltenbacher et al. 2008). In the case of perfect models, the EnKM has already been analyzed in depth in Schillings and Stuart (2018) and Evensen (2018) and convergence and identifiability enhancements have been proposed in Wu et al. (2019) and Iglesias (2016). Due to the iterative nature of the EnKM, dealing with high-dimensional parametric problems is often computationally challenging. In Gao and Wang (2021) a multi-level strategy has been proposed to improve the computational performance of the method.

In this work we propose an algorithm, called reduced basis ensemble Kalman method (RB-EnKM), that leverages the computational efficiency of surrogate models obtained with MOR techniques to solve asynchronous data assimilation problems via ensemble Kalman methods. The use of the EnKM allows us to avoid adjoint problems that are often difficult to reduce and intrinsically depend on the choice of measurement positions. Model order reduction, already employed in other data assimilation problems (Gong et al. 2019; Nadal et al. 2015), is used as a key tool for accelerating the method. However, the use of approximate models within the EnKM introduces a model error that could hinder the convergence of the method. In this work, we propose to deal with this error by including a prior estimation of the bias in the data. Specifically, we incorporate empirical estimates of the mean and covariance of the bias in the Kalman gain. In some instances, those quantities can be computed at a negligible cost by employing the same training set used for the construction of the reduced model.

The paper is structured as follows: in Sect. 2 we introduce the asynchronous data assimilation problem together with the standard ensemble Kalman method (Algorithm 1). Subsequently, in Sect. 3.1, we present an overview on reduced basis (RB) methods and describe how to use them in combination with the ensemble Kalman method to derive the RB-EnKM (Algorithm 2). In Sect. 4, we test the new method on two numerical examples. In the first example, we estimate the diffusivity in a linear advection-dispersion problem in 2D (Sect. 4.1), while in the second, we estimate the hydraulic log-conductivity in a non-linear hydrological problem (Sect. 4.2). In both cases, we compare the behavior of the full order and reduced order models in different conditions. Section 5 provides conclusions and considerations on the proposed method and on its numerical performance.

2 Problem formulation

Let \((\mathcal {U},\mathcal {H})\) be a suitable pair of function spaces and let \(\mathcal {P}\subset {\mathbb {R}^{N_p}}\), with \({N_p}\in {\mathbb {N}^+}\), be a set of model parameters. We consider the pPDE: for any parameter \(\varvec{\mu }\in \mathcal {P}\), find \(u(\varvec{\mu }) \in \mathcal {U}\) such that \(\partial _t u(\varvec{\mu }) = \mathcal {F}_{\varvec{\mu }} u(\varvec{\mu })\), \(u(0,\varvec{\mu })=u_0(\varvec{\mu })\). Here \(\mathcal {F}_{\varvec{\mu }}\) is a generic parameterized differential operator, \(\partial _t\) is the first order partial time derivative and \(u_0(\varvec{\mu }) \in \mathcal {H}\) is a parameterized initial condition. This pPDE provides the constraint to the inverse problem of estimating the unknown parameter \({\varvec{\mu }^\star }\in \mathcal {P}\) from data or observations given by

$$\begin{aligned} \begin{aligned}&{\textbf{y}({\varvec{\mu }^\star }, \varvec{\eta }) = \mathcal {L}u({\varvec{\mu }^\star }) + \varvec{\eta }\quad s.t.}\\&{\partial _t u({\varvec{\mu }^\star }) = \mathcal {F}_{{\varvec{\mu }^\star }} u({\varvec{\mu }^\star }), \,u(0,{\varvec{\mu }^\star })=u_0({\varvec{\mu }^\star }). } \end{aligned} \end{aligned}$$
(1)

Here, \(\mathcal {L}: \mathcal {U}\rightarrow {\mathbb {R}^{N_m}}\), with \({N_m}\in {\mathbb {N}^+}\), maps the space of the solutions to the space of the measurements, simulating the observation process, and \(\varvec{\eta }\) is an unknown realization of a Gaussian random variable with zero mean and given covariance, \(\varvec{\Sigma }\in {\mathbb {R}^{{N_m}\times {N_m}}}\). Note that both the observed data \(\textbf{y}\) and the additive experimental noise \(\varvec{\eta }\) are \({N_m}\)-dimensional vector-valued quantities and that \(\varvec{\Sigma }\) is a symmetric positive-definite matrix defining the inner product \({\Vert \cdot \Vert ^2_{{\varvec{\Sigma }}^{-1}}} :={\Vert \varvec{\Sigma }^{-1/2} \cdot \Vert _2}\) on \(\mathbb {R}^{N_m}\), where \({\Vert \cdot \Vert _2}\) is the Euclidean norm.

To solve this inverse problem, we must explicitly solve the pPDE (1). This is done using a suitable discretization, in space and time, of the differential operators \(\mathcal {F}_{\varvec{\mu }}\) and \(\partial _t\). To this end, we introduce the approximation spaces \( \mathcal {V}_h \subset \mathcal {U}\) and \( \mathcal {H}_h \subset \mathcal {H}\) and the discretized initial condition \(u_{h,0}(\varvec{\mu })\in \mathcal {H}_h\) so that the approximate problem reads:

$$\begin{aligned} {\text {find} \,\, u_h(\varvec{\mu }) \in \mathcal {V}_h \quad \text {s.t.} \quad \partial _t u_h({\varvec{\mu }}) = \mathcal {F}^h_{\varvec{\mu }}u_h({\varvec{\mu }}), \, u_h({0, \varvec{\mu }})=u_{h,0}(\varvec{\mu }).} \end{aligned}$$
(2)

The discretization of the pPDE can be chosen according to the specific problem of interest. In all numerical examples proposed in this work, we employ a space-time Petrov–Galerkin discretization of (1) with piecewise polynomial trial and test spaces, as described in Sect. 4, and we assume (2) to be sufficiently accurate such that we can take \(\textbf{y}({\varvec{\mu }^\star }, \varvec{\eta }) ={\mathcal {L}u_h({{\varvec{\mu }^\star }})} + \varvec{\eta }\).

To characterize the observation of the solution, we introduce the forward response map \(\mathcal {G}: \mathcal {P}\rightarrow {\mathbb {R}^{N_m}}\) defined as \({\mathcal {G}(\varvec{\mu })} :={\mathcal {L}u_h({\varvec{\mu }})} \) for any solution of the pPDE (2). Although the use of the map \(\mathcal {G}\) results in a more compact notation, omitting its dependence on the solution of the pPDE conceals a key aspect of the method, i.e., the mapping from the parameter vector to the corresponding space-time pPDE solution. For this reason, and because it makes it harder to introduce the problem discretization, it will be used with caution.

2.1 The ensemble Kalman method

The data assimilation problem presented above can be recast as a minimization problem for the cost functional, \({\Phi (\varvec{\mu } \,\vert \, \textbf{y}) } :={\Vert \textbf{y}({\varvec{\mu }^\star }, \varvec{\eta }) - \mathcal {L} u_h({\varvec{\mu }}) \Vert ^2_{{\varvec{\Sigma }}^{-1}}}\), representing the misfit between the experimental data, \(\textbf{y}({\varvec{\mu }^\star }, \varvec{\eta })\), and the forward response. The optimal parameter estimate \({\varvec{\mu }_{\text {opt}}}(\textbf{y})\) is thus given by

$$\begin{aligned} \begin{aligned} {\varvec{\mu }_{\text {opt}}}(\textbf{y})&= {\text {arg}\min _{\varvec{\mu }\in \mathcal {P}}\,} {\Phi (\varvec{\mu } \,\vert \, \textbf{y}) } \quad s.t.\\ {\quad \partial _t u_h({\varvec{\mu }})}&= {\mathcal {F}^h_{\varvec{\mu }} u_h({\varvec{\mu }}), \, u_h({0, \varvec{\mu }})=u_{h,0}(\varvec{\mu }).} \end{aligned} \end{aligned}$$
(3)

This is equivalent to a maximum likelihood estimation, given the likelihood function, \(l(\varvec{\mu }\,\vert \, \textbf{y}) = \exp \{-\frac{1}{2} {\Phi (\varvec{\mu } \,\vert \, \textbf{y}) } \}\), associated with the probability density function of the data, \(\textbf{y}\vert \varvec{\mu }\), i.e., the probability of observing \(\textbf{y}\) if \(\varvec{\mu }\) is the parametric state. The shape of the function follows from the probability density function of the Gaussian noise realization.

Among various methods proposed to solve this optimization problem, the EnKM relies on a sequence of parameter ensembles \({\mathcal {E}_{n}}\), with \(n \in {\mathbb {N}^+}\), to estimate the minimum of the cost functional. Each ensemble consists of a collection \({\{{\varvec{\mu }_{n}^{{}_{\left( j\right) }}}\}_{{}^{{j}=1}}^{{}_{J}}}\) of \({J}\in {\mathbb {N}^+}\) parameter vectors \({\varvec{\mu }_{n}^{{}_{\left( j\right) }}}\), hereby named ensemble members or particles, whose interaction, guided by the experimental measurements, causes them to cluster around the solution of the problem as iterations proceed. At the beginning of each iteration, the solution of the pPDE and its observations are computed for each \({j}\in \{1,\ldots ,{J}\}\). Subsequently, the ensemble is updated based on the empirical correlation among parameters and between parameters and measurements, as well as on the misfits between the experimental measurements \(\textbf{y}({\varvec{\mu }^\star }, \varvec{\eta })\) and the particle measurements \(\mathcal {L} u_h({{\varvec{\mu }_{n}^{{}_{\left( j\right) }}}})\). A single iteration, equivalent to the one in Iglesias et al. (2013), is formalized in the following pseudo algorithm:

Algorithm 1

Iterative ensemble method for inverse problems.

Input. Let \({\mathcal {E}_{0}}\) be the initial ensemble with elements \(\{{\varvec{\mu }_{0}^{{}_{\left( j\right) }}}\}_{{}^{j=1}}^{{}_J}\) sampled from a given distribution \({\Pi _{0}} ( \varvec{\mu })\). Let \(\varvec{\Sigma }\) be the a priori known noise covariance and \(\textbf{y}\) the vector of noisy measurements collected from the physical system. Let \(\tau \ll 1\) be the termination tolerance.

For \(n = 0,1,\ldots \)

  1. (i)

    Prediction step. Compute the synthetic measurements of the solution over a time interval \(\mathcal {I}\) for each particle in the last updated ensemble:

    $$\begin{aligned} \begin{aligned} {{\mathcal {G}({\varvec{\mu }_{n}^{{}_{\left( j\right) }}})}}&= {\mathcal {L} u_h({{\varvec{\mu }_{n}^{{}_{\left( j\right) }}}}) \quad \text {for all } {j}\in \{1,\ldots ,{J}\} \quad s.t.}\\ {\partial _t u_h({{\varvec{\mu }_{n}^{{}_{\left( j\right) }}}})}&= {\mathcal {F}^h_{{\varvec{\mu }_{n}^{{}_{\left( j\right) }}}} u_h({{\varvec{\mu }_{n}^{{}_{\left( j\right) }}}}), \,\, u_h({0, {\varvec{\mu }_{n}^{{}_{\left( j\right) }}}})=u_{h,0}({\varvec{\mu }_{n}^{{}_{\left( j\right) }}}).} \end{aligned} \end{aligned}$$
    (4)
  2. (ii)

    Intermediate step. From the last updated ensemble measurements and parameters, compute the sample means and covariances:

    $$\begin{aligned} \textbf{P}_n&= \frac{1}{{J}} \sum _{{j}=1}^{J}{\mathcal {G}({\varvec{\mu }_{n}^{{}_{\left( j\right) }}})} {\mathcal {G}({\varvec{\mu }_{n}^{{}_{\left( j\right) }}})}^\top - \,\overline{\mathcal {G}}_{n} \overline{\mathcal {G}}_{n}^\top{} & {} \quad \text {with} \quad \overline{\mathcal {G}}_{n} = \frac{1}{{J}} \sum _{{j}=1}^{J}{\mathcal {G}( {\varvec{\mu }_{n}^{{}_{\left( j\right) }}} )} \end{aligned}$$
    (5)
    $$\begin{aligned} \textbf{Q}_n&= \frac{1}{{J}} \sum _{{j}=1}^{J}{\varvec{\mu }_{n}^{{}_{\left( j\right) }}} {\mathcal {G}({\varvec{\mu }_{n}^{{}_{\left( j\right) }}})}^\top - \,{\overline{\varvec{\mu }}_{n}} \overline{\mathcal {G}}_{n}^\top{} & {} \quad \text {with} \quad {\overline{\varvec{\mu }}_{n}} = \frac{1}{{J}} \sum _{{j}=1}^{J}{\varvec{\mu }_{n}^{{}_{\left( j\right) }}}. \end{aligned}$$
    (6)
  3. (iii)

    Analysis step. Update each particle in the ensemble: \(\text {for all } {j}\in \{1,\ldots ,{J}\}\)

    $$\begin{aligned} { \varvec{\gamma }_n^{{}_{(j)}}}&{\sim \mathcal {N} (0, \varvec{\Sigma }),} \end{aligned}$$
    (7)
    $$\begin{aligned} {{\varvec{\mu }_{n+1}^{{}_{\left( j\right) }}}}&{= {\varvec{\mu }_{n}^{{}_{\left( j\right) }}} + \textbf{Q}_n ( \textbf{P}_n + \varvec{\Sigma })^{-1} \left( \textbf{y}- {\mathcal {G}( {\varvec{\mu }_{n}^{{}_{\left( j\right) }}} )} - \varvec{\gamma }_n^{{}_{(j)}} \right) }. \end{aligned}$$
    (8)
  4. (iv)

    Termination step. Stop the algorithm when the termination criterion is satisfied. Here, we terminate when the relative change in the mean parameter is less than the tolerance:

    $$\begin{aligned} { {\Vert {\overline{\varvec{\mu }}_{n+1}}-{\overline{\varvec{\mu }}_{n}} \Vert _2} \le \tau {\Vert {\overline{\varvec{\mu }}_{n+1}} \Vert _2} \quad \text {with} \quad {\overline{\varvec{\mu }}_{n+1}} = \frac{1}{{J}} \sum _{{j}=1}^{J}{\varvec{\mu }_{n+1}^{{}_{\left( j\right) }}}. } \end{aligned}$$
    (9)

In the last step of the algorithm, the cross correlation matrices \({\textbf{P}_{n}}\) and \({\textbf{Q}_{n}}\) are used to compute the Kalman gain \({\textbf{K}_{n}} :={\textbf{Q}_{n}} {{ ({\textbf{P}_{n}} + \varvec{\Sigma }) }^{-1}}\). This modulates the extent of the correction: a low-gain corresponds to conservative behavior, i.e., small changes in the particle positions, while a high-gain involves a larger correction. Note that the experimental data are perturbed with artificial noise sampled from the same distribution assumed for the experimental noise \(\varvec{\eta }\). This leads to an improved estimate over the unperturbed case.

A termination criterion for the algorithm is essential for the proper implementation of the method. The one presented in Iglesias et al. (2013) is based on the discrepancy principle and consists in stopping the algorithm when the error between the experimental data and the measurements is comparable to the experimental noise, that is, when \({\Vert \textbf{y}-{\mathcal {G}({\overline{\varvec{\mu }}_{n}})} \Vert ^2_{{\varvec{\Sigma }}^{-1}}} \le \sigma {\Vert \varvec{\eta } \Vert ^2_{{\varvec{\Sigma }}^{-1}}}\) for some \(\sigma \ge 1\). An alternative approach is to set a threshold for the norm of the parameter update, i.e., to terminate the algorithm when \({\Vert {\overline{\varvec{\mu }}_{n+1}}-{\overline{\varvec{\mu }}_{n}} \Vert _2} \le \tau {\Vert {\overline{\varvec{\mu }}_{n+1}} \Vert _2}\) for some \(\tau \ll 1\). The latter criterion is more robust to model errors and is therefore used in our numerical experiments.

Equally important for the method is the choice of the distribution \({\Pi _{0}}\) from which the initial ensemble (or first guess) \({\mathcal {E}_{0}}\) is sampled. In most of the cases, including those considered in our numerical experiments, the distribution \(\Pi _0\) comes from an a priori knowledge of the range of admissible parameters. In other scenarios, e.g., when the parameters live in an infinite-dimensional space, it may be necessary to define additional criteria on how to treat the parameter space. The initial ensemble plays a fundamental role in stabilizing the inverse problem. Indeed, it has been shown in Iglesias et al. (2013) that all the ensembles generated by Algorithm 1 are contained in the space spanned by the initial ensemble, that is

$$\begin{aligned} {\mathcal {E}_{n}} \in \mathcal {A} :=\text {span}\, {\{{\varvec{\mu }_{0}^{{}_{\left( j\right) }}}\}_{{}^{{j}=1}}^{{}_{J}}} \quad \text {for all } \, n \in {\mathbb {N}^+}. \end{aligned}$$
(10)

Furthermore, in the mean-field limit, i.e., in the case of infinite particles, and assuming an affine relationship between parameters and synthetic measurements, the distribution \({\Pi _{0}}\) plays the same role as the Tikhonov regularization in variational data assimilation, see (Asch et al. 2016). In particular, the stabilization term is given by \(-\log _e {\Pi _{0}}(\varvec{\mu })\).

The main sources of error of the EnKM are associated with the ensemble size and with the evaluation of \(\mathcal {G}({\varvec{\mu }_{n}^{{}_{\left( j\right) }}}) = \mathcal {L}u_h({{\varvec{\mu }_{n}^{{}_{\left( j\right) }}}})\). Indeed, while the observation of the solution is accurate and computationally cheap to evaluate, due to the linearity of the operator, the accuracy in the computation of the pPDE solution intrinsically depends on the quality of the numerical discretization. High order numerical discretizations might require prohibitively large computational costs, especially if the pPDE (2) is solved for many values of the parameter and over long temporal intervals.

The other steps of Algorithm 1 involve the following operations: (i) the assembly of \(\textbf{P}_n\) and \(\textbf{Q}_n\) in (5)-(6), with computational complexity of order \(\mathcal {O}(J N_m^2)\), and (ii) the inversion of the matrix \(\textbf{P}_n+\varvec{\Sigma }\) in the analysis step (8) with complexity \(\mathcal {O}(N_m^3)\). The solution of the pPDE (2), \(\text {for all } {j}\in \{1,\ldots ,{J}\}\), in the prediction step of Algorithm 1 is thus the computational bottleneck of the EnKM algorithm.

3 Surrogate models

3.1 Reduced basis methods

Given the need to solve the pPDE (2) for several instances of the parameter, the use of MOR techniques appears an ideal choice. Model order reduction has allowed exceptional computational speed-ups in settings that require repeated model evaluations, such as multi-query simulations. In MOR the high-dimensional problem is replaced with a surrogate model of reduced dimensionality that still possesses optimal or near-optimal approximation properties but that can be solved at a considerably reduced computational cost. In this work, we focus on a particular class of MOR techniques, known as reduced basis methods (Prud’homme et al. 2002).

The reduced basis method typically consists of two phases: an offline phase and an online phase. In the computationally expensive offline phase a low-dimensional approximation of the solution space, namely the reduced space, is constructed and a surrogate model is derived via projection of the full order model onto the reduced space. Then, the resulting low-dimensional reduced model can be solved in the online phase for many instances of the parameter at a computational cost independent of the size of the full order model.

To be more precise, let \(\mathcal {M}:= \{ {u_h(t, \varvec{\mu }) \in \mathcal {V}_h \,\vert \, \partial _t u_h({\varvec{\mu }}) = \mathcal {F}^h_{\varvec{\mu }} u_h({\varvec{\mu }}),} \, u_h({0, \varvec{\mu }})=u_{h,0}(\varvec{\mu }) \, \text{ for } \text{ all }\, \varvec{\mu }\in \mathcal {P},\,t\in \mathcal {I} \}\) be the solution set which collects the solution of the discretized pPDE (2) evaluated at times \(t\in \mathcal {I}:=(0,T]\), with \(T\in {\mathbb {R}^+}\), for a set of parameters \(\varvec{\mu }\in \mathcal {P}\). The parametric problem (2) is said to be reducible if the solution set \(\mathcal {M}\) can be well approximated by a low-dimensional linear subspace. In this case, such a subspace is obtained as the span of a problem-dependent basis derived from a collection of full order solutions or snapshots, \(\{ u_h(t_n, \varvec{\mu }_s) \}_{s,n=1}^{S,R}\), with \(S,R \in {\mathbb {N}^+}\), at sampled values, \(\{ \varvec{\mu }_s \}_{s=1}^{S}\), of the parameter, and at discrete times, \(\{t_n\}_{n=1}^R\). The set \(\mathcal {P}_\text {TRAIN}:=\{ \varvec{\mu }_s \}_{s=1}^{S}\subset \mathcal {P}\) of training parameters is a sufficiently rich subset of the parameter space that can be obtained by drawing random samples from a uniform distribution in \(\mathcal {P}\) or with other sampling techniques, such as statistical methods and sparse grids, see (Quarteroni et al. 2015, Chapter 6) and references therein. The extraction of the basis functions from the snapshots is usually performed using SVD-type algorithms such as the proper orthogonal decomposition (POD) (Berkooz et al. 1993) or greedy algorithms. For problems that depend on both time and parameters, the so-called POD-Greedy method (Grepl and Patera 2005; Haasdonk and Ohlberger 2008) combines a greedy algorithm in parameter space with the proper orthogonal decomposition in time at a given parameter. In the numerical tests of this work, we rely on the Weak-POD-Greedy algorithm, which is the preferred method whenever a rigorous error bound can be derived, while the POD and the Strong-POD-Greedy are often used when a bound is unavailable, i.e., for most non-linear problems. Note that, under the same choice of training parameters, the latter are more accurate but computationally less efficient.

Once an \(N_\varepsilon \)-dimensional set of spatial reduced basis functions \(\{ \psi _{i}\}_{i=1}^{N_\varepsilon }\) is obtained and a set of time basis functions \(\{ \upsilon _n \}_{n=1}^{N_t}\) is selected, the reduced spaces \(\mathcal {H}_{\varepsilon } = \text {span} \{ \psi _{i}\}_{i=1}^{N_\varepsilon } \subset \mathcal {H}_h\) and \(\mathcal {V}_{\varepsilon } = \text {span} \{ \upsilon _n {\otimes } \psi _{i}\}_{i,n=1}^{N_\varepsilon , N_t} \subset \mathcal {V}_h\) are constructed. The full model solution \(u_h(\varvec{\mu })\) and the initial condition \(u_{h,0}(\varvec{\mu })\), for a given \(\varvec{\mu }\), are approximated by the functions \(u_{\varepsilon }(\varvec{\mu })\) in \(\mathcal {V}_{\varepsilon }\) and \(u_{\varepsilon ,0}(\varvec{\mu })\) in \(\mathcal {H}_{\varepsilon }\),

$$\begin{aligned} { u_{\varepsilon }(\varvec{\mu }) = \sum _{i,n=1}^{N_\varepsilon , N_t} u_{i,n}(\varvec{\mu })\,\upsilon _n \, \psi _i, \quad u_{\varepsilon ,0}(\varvec{\mu }) = \sum _{i=1}^{N_\varepsilon } u_{i,0}(\varvec{\mu })\,\psi _i, \quad \varvec{\mu }\in \mathcal {P},} \end{aligned}$$

where \((u_{1,1}(\varvec{\mu }),\ldots ,u_{N_\varepsilon ,N_t}(\varvec{\mu }))^\top \in \mathbb {R}^{N_\varepsilon N_t}\) and \((u_{1,0}(\varvec{\mu }),\ldots ,u_{N_\varepsilon ,0}(\varvec{\mu }))^\top \in \mathbb {R}^{N_\varepsilon }\) denote the vectors of expansion coefficients in the reduced basis. The reduced model thus reads:

$$\begin{aligned} { \text {find} \,\, u_{\varepsilon }(\varvec{\mu }) \in \mathcal {V}_{\varepsilon } \quad \text {s.t.} \quad \partial _t u_{\varepsilon }({\varvec{\mu }}) = \mathcal {F}^{\varepsilon }_{\varvec{\mu }}u_{\varepsilon }({\varvec{\mu }}), \quad u_{\varepsilon }({0, \varvec{\mu }})=u_{\varepsilon ,0}(\varvec{\mu }),} \end{aligned}$$
(11)

where the operator \(\mathcal {F}^{\varepsilon }_{\varvec{\mu }}\) is obtained by projecting the full order operator \(\mathcal {F}^{h}_{\varvec{\mu }}\) onto the reduced space \(\mathcal {V}_{\varepsilon }\). Note that we set \(N_t = R\) in the sequel since we do not consider a temporal compression. However, choosing \(R < N_t\) is also possible.

The computational gain derived from solving problem (11) instead of the full order model (2) hinges on the feasibility of a complete decoupling of the offline and online phases. A computational complexity of the online phase independent of the size of the full order problem can be achieved under the assumption of linearity and parameter-separability of the operator \(\mathcal {F}^{h}_{\varvec{\mu }}\). To deal with general non-linear operators, hyper-reduction techniques are required. These include methods for approximating the high-dimensional non-linear term \(\mathcal {F}^{h}_{\varvec{\mu }}\) with an empirical affine decomposition, such as the EIM (Barrault et al. 2004), and methods for reducing the cost of evaluating the non-linear term, such as linear program empirical quadrature (Yano and Patera 2019) and empirical cubature (Hernández et al. 2017).

3.2 A reduced basis ensemble Kalman method

In this section, we discuss the implications of replacing the high-fidelity model in the prediction step of the EnKM by a surrogate model derived via model order reduction, as described in Sect. 3.1. The use of MOR for particle-based methods is particularly desirable in multi-query contexts since it allows us to significantly reduce the computational cost of solving the inverse problem. However, the approximation introduced by the model order reduction inevitably produces (small) deviations of the reduced solution from the full order one. This constitutes a problem for data assimilation algorithms, as already documented and investigated in Calvetti et al. (2018) and in other works. Indeed, the error in the solution results in discrepancies between approximated and exact measurements. Although we can expect the mismatch \(\varvec{\delta }_\varepsilon (\varvec{\mu }) :=\mathcal {L}u_h(\varvec{\mu }) - \mathcal {L}u_\varepsilon (\varvec{\mu })\) to decrease with the approximation error of \(u_\varepsilon (\varvec{\mu })\), this bias will inevitably entail a distortion of the loss functional obtained by simple model substitution, i.e.,

$$\begin{aligned} { \widetilde{\Phi } (\varvec{\mu }\vert \,\textbf{y}) :=\Vert \textbf{y}({\varvec{\mu }^\star }, \varvec{\eta }) - \mathcal {L}u_{\varepsilon }({\varvec{\mu }})\Vert ^2_{\varvec{\Sigma }^{-1}}.} \end{aligned}$$
(12)

Note that this cost function does not vanish in the parameter \(\varvec{\mu }\) we are trying to estimate, not even in noise free conditions. This systematic error, independent of the magnitude of the experimental noise, can be mitigated by modifying the cost function and consequently the EnKM. A modified algorithm, which we refer to as adjusted RB-EnKM is presented in the following sections. This algorithm is in contrast to what we refer to as the biased RB-EnKM, i.e., the algorithm obtained by the simple substitution of the full order model with the reduced order model in Algorithm 1, as presented in (12).

The modification of the algorithm can proceed in two ways. One possibility is to rewrite the exact cost function in terms of the surrogate model and the measurement bias, namely substituting \(\mathcal {L}u_h({\varvec{\mu }}) = \mathcal {L}u_{\varepsilon }({\varvec{\mu }}) + \varvec{\delta }_\varepsilon (\varvec{\mu })\) in the minimization problem (3) to obtain

$$\begin{aligned} {\Phi _1 (\varvec{\mu }\vert \,\textbf{y})}:= & {} {\, \Vert \textbf{y}({\varvec{\mu }^\star }, \varvec{\eta }) - \mathcal {L}u_{\varepsilon }({\varvec{\mu }}) - \varvec{\delta }_\varepsilon (\varvec{\mu }) \Vert ^2_{\varvec{\Sigma }^{-1}}} \nonumber \\&{=}&{\, \Vert \mathcal {L}u_h({{\varvec{\mu }^\star }}) - \mathcal {L}u_h({\varvec{\mu }}) + \varvec{\eta }\, \Vert ^2_{\varvec{\Sigma }^{-1}} \equiv \Phi (\varvec{\mu }\vert \,\textbf{y}).} \end{aligned}$$
(13)

A second option is to correct the experimental data involved in the biased cost function (12) so that, at least in noise free conditions, its minimum coincides with the minimum of the exact cost function. This means subtracting \(\varvec{\delta }_\varepsilon ({\varvec{\mu }^\star })\) instead of \(\varvec{\delta }_\varepsilon (\varvec{\mu })\), and results in the new cost function

$$\begin{aligned} {\Phi _2 (\varvec{\mu }\vert \,\textbf{y})}:= & {} {\, \Vert \textbf{y}({\varvec{\mu }^\star }, \varvec{\eta }) - \mathcal {L}u_{\varepsilon }({\varvec{\mu }}) - \varvec{\delta }_\varepsilon ({\varvec{\mu }^\star }) \Vert ^2_{\varvec{\Sigma }^{-1}}} \nonumber \\&{=}&{\, \Vert \mathcal {L}u_{\varepsilon }({{\varvec{\mu }^\star }}) - \mathcal {L}u_{\varepsilon }({\varvec{\mu }}) + \varvec{\eta }\, \Vert ^2_{\varvec{\Sigma }^{-1}} \not \equiv \Phi (\varvec{\mu }\vert \,\textbf{y}).} \end{aligned}$$
(14)

In noise free conditions, i.e., if \(\varvec{\eta }= \textbf{0}\), both cost functions vanish at the exact value \({\varvec{\mu }^\star }\). Since the cost functions \(\Phi _1\) and \(\Phi _2\) are non-negative, the minimum attained in \({\varvec{\mu }^\star }\) is necessarily also a global minimum.

In the following, we focus on the second approach. The reason is that the first approach requires the evaluation of the bias at all parameter values \(\varvec{\mu }\in \mathcal {P}\) which is too expensive to perform. Furthermore, at the algorithmic level, the substitution of the true model with the sum of the surrogate model and its bias would significantly change the computation of \(\textbf{P}_n\) and \(\textbf{Q}_n\), and thus the algorithm structure. By contrast, the second approach is based on the assumption that the true model is incorrect and on the subsequent correction of the experimental data. This implies that \(\varvec{\delta }_\varepsilon ({\varvec{\mu }^\star })\) is the only bias involved and it requires just a single full order evaluation. However, since the argument \({\varvec{\mu }^\star }\) is unknown, this is clearly not possible, and we must instead exploit the prior epistemic uncertainty on \({\varvec{\mu }^\star }\), encoded in \(\Pi _0(\varvec{\mu })\), to modify the cost function.

If \({\varvec{\mu }^\star }\) is treated as a random variable with probability measure \(\Pi _0\), then the data bias \(\varvec{\delta }^\star _\varepsilon = \varvec{\delta }_\varepsilon ({\varvec{\mu }^\star })\) is in turn a random variable with probability measure \(\Pi _0 \circ \varvec{\delta }_\varepsilon ^{-1}\). The moments of this distribution, henceforth denoted by \(\overline{\varvec{\delta }}_\varepsilon \) and \(\varvec{\Gamma }_\varepsilon \), can be empirically estimated via pointwise evaluations of the bias without further assumptions on the nature of the distribution itself. However, the assumption of Gaussianity, although improperly implying the linearity of \(\varvec{\delta }_\varepsilon : \mathcal {P} \rightarrow \mathbb {R}^{N_m}\), is consistent with the other assumptions of Gaussianity and linearity required for the derivation of the EnKF (Evensen 2003). Furthermore, it allows us to obtain closed-form results, as shown in the next paragraphs.

In view of the fact that \(\varvec{\delta }^\star _\varepsilon \) is considered as a random variable, we change (14) to make the dependence of the cost function \(\Phi _2\) on \(\varvec{\delta }^\star _\varepsilon \) explicit; i.e.

$$\begin{aligned} { \Phi _\varepsilon (\varvec{\mu }\, \vert \, \textbf{y}, \varvec{\delta }^\star _\varepsilon ) :=\, \Vert \textbf{y}({\varvec{\mu }^\star }, \varvec{\eta }) - \mathcal {L}u_{\varepsilon }({\varvec{\mu }}) - \varvec{\delta }^\star _\varepsilon \Vert ^2_{\varvec{\Sigma }^{-1}}.} \end{aligned}$$
(15)

In order to make the estimate of \(\varvec{\mu }\) dependent only on the experimental data, we must remove the conditioning on \(\varvec{\delta }^\star _\varepsilon \), i.e., marginalize out the random variable. The easiest way to do this is by employing Bayesian statistics and particularly recovering the same marginal distribution \(\textbf{y}\vert \varvec{\mu }\) mentioned at the beginning of Sect. 2.1. To this end, we consider the likelihood function \( l (\varvec{\mu }\, \vert \, \textbf{y}, \varvec{\delta }^\star _\varepsilon ) :=\exp \{ - \frac{1}{2} \Phi _\varepsilon (\varvec{\mu }\, \vert \, \textbf{y}, \varvec{\delta }^\star _\varepsilon ) \}\), proportional to the density of \((\textbf{y}\,\vert \, \varvec{\mu }, \varvec{\delta }^\star _\varepsilon ) \sim \mathcal {N}( \varvec{\delta }^\star _\varepsilon + \mathcal {L}u_{\varepsilon }({\varvec{\mu }}), \varvec{\Sigma })\). Employing (Särkkä 2013, Lemma 1.A), concerning the mean and covariance of the joint distribution of Gaussian variables, it can be easily proven that, if \(\varvec{\delta }^\star _\varepsilon \sim \mathcal {N}(\overline{\varvec{\delta }}_\varepsilon , \Gamma _\varepsilon )\), then \(\textbf{y}\,\vert \, \varvec{\mu }\sim \mathcal {N}( \overline{\varvec{\delta }}_\varepsilon + \mathcal {L}u_{\varepsilon }({\varvec{\mu }}), \varvec{\Sigma }+ \Gamma _\varepsilon )\) and consequently we derive the marginalized cost functional

$$\begin{aligned} { \Phi _\varepsilon (\varvec{\mu }\, \vert \, \textbf{y}) :=\, \Vert \textbf{y}- \mathcal {L}u_{\varepsilon }({\varvec{\mu }}) - \overline{\varvec{\delta }}_\varepsilon \Vert ^2_{(\varvec{\Sigma }+\Gamma _\varepsilon )^{-1}}.} \end{aligned}$$
(16)

Hence, by analogy with Sect. 2.1, we can adapt the EnKM to optimize the new cost function under the surrogate model constraint (11). The resulting adjusted RB-EnKM is summarized in Algorithm 2. Unlike the reference EnKM, we distinguish between an offline and an online phase. In the offline phase, the training set of full order solutions is generated and used both to construct the surrogate model and to estimate the moments of \(\varvec{\delta }^\star _\varepsilon \). In the online phase, the actual optimization is performed.

Algorithm 2

Iterative ensemble method with reduced basis surrogate models and accounting for the associated measurements bias.

Offline:

Input. Let \(\mathcal {P}_{\text {TRAIN}}\) be a set of S parameters \(\{\varvec{\mu }_{0}^{{}_{(s)}}\}_{{}^{s=1}}^{{}_S}\) sampled from a given probability distribution \({\Pi _{0}} ( \varvec{\mu })\) and let \(\{ u_h(\varvec{\mu }_s) \}_{s=1}^{S}\) be the associated training set of full order solutions. Let \(\varepsilon \in {\mathbb {R}^+}\) be a prescribed tolerance.

  1. (i)

    Model order reduction. Relying on the training set \(\{ u_h(\varvec{\mu }_s) \}_{s=1}^{S}\), construct a surrogate model of accuracy \(\varepsilon \) as explained in Sect. 3.1 and compute the set of reduced basis solutions \(\{ u_\varepsilon (\varvec{\mu }_s) \}_{s=1}^{S}\).

  2. (ii)

    Data bias estimation. Define the training biases as

    $$\begin{aligned} \varvec{\delta }_{\varepsilon }(\varvec{\mu }^{{}_{(s)}}) = \mathcal {L} u_h({\varvec{\mu }}^{{}_{(s)}}) - \mathcal {L} u_{\varepsilon }({\varvec{\mu }}^{{}_{(s)}}) \quad \text {for all } s \in \{1,...,S\} \end{aligned}$$
    (17)

    and the associated empirical moments

    $$\begin{aligned} \varvec{\Gamma }_{\varepsilon } = \frac{1}{S} \sum _{s=1}^S \varvec{\delta }_{\varepsilon }(\varvec{\mu }^{{}_{(s)}}) \varvec{\delta }_{\varepsilon }(\varvec{\mu }^{{}_{(s)}})^\top - \,\overline{\varvec{\delta }}_\varepsilon \overline{\varvec{\delta }}_\varepsilon ^\top \quad \text {with} \quad \,\,\overline{\varvec{\delta }}_\varepsilon = \frac{1}{S} \sum _{s=1}^S \varvec{\delta }_{\varepsilon }(\varvec{\mu }^{{}_{(s)}}). \end{aligned}$$
    (18)

Online:

Input. Let \({\mathcal {E}_{0}}\) be the initial ensemble with elements \(\{{\varvec{\mu }_{0}^{{}_{\left( j\right) }}}\}_{{}^{j=1}}^{{}_J}\) sampled from a given distribution \({\Pi _{0}} ( \varvec{\mu })\). Let \(\varvec{\Sigma }\) be the a priori known noise covariance and \(\textbf{y}\) the vector of noisy measurements collected from the physical system. Let \(\tau \ll 1\) be the termination parameter.

For \(n = 0,1,\ldots \)

  1. (i)

    Prediction step. Compute the biased measurements of the approximated solution over a time interval \(\mathcal {I}\) for each particle in the last updated ensemble:

    $$\begin{aligned} \begin{aligned} {{\mathcal {G}_\varepsilon ({\varvec{\mu }_{n}^{{}_{\left( j\right) }}})}}&\,{= {\mathcal {L}u_{\varepsilon }({{\varvec{\mu }_{n}^{{}_{\left( j\right) }}}})} \quad \text {for all } {j}\in \{1,\ldots ,{J}\} \quad s.t.} \\ {\partial _t u_{\varepsilon }({{\varvec{\mu }_{n}^{{}_{\left( j\right) }}}})}&\,{= \mathcal {F}^\varepsilon _{{\varvec{\mu }_{n}^{{}_{\left( j\right) }}}} u_{\varepsilon }({{\varvec{\mu }_{n}^{{}_{\left( j\right) }}}}), \,\, u_{\varepsilon }({0, {\varvec{\mu }_{n}^{{}_{\left( j\right) }}}})=u_{\varepsilon ,0}({\varvec{\mu }_{n}^{{}_{\left( j\right) }}}).} \end{aligned} \end{aligned}$$
    (19)
  2. (ii)

    Intermediate step. From the last updated ensemble measurements and parameters, define the sample means and covariances:

    $$\begin{aligned} \textbf{P}_{n,\varepsilon }&= \frac{1}{{J}} \sum _{{j}=1}^{J}{\mathcal {G}_\varepsilon ({\varvec{\mu }_{n}^{{}_{\left( j\right) }}})}\, {\mathcal {G}_\varepsilon ({\varvec{\mu }_{n}^{{}_{\left( j\right) }}})}^\top - \,\overline{\mathcal {G}}_{{n},\varepsilon } \, \overline{\mathcal {G}}_{{n},\varepsilon }^\top{} & {} \, \text {with} \,\,\, \overline{\mathcal {G}}_{{n},\varepsilon } = \frac{1}{{J}} \sum _{{j}=1}^{J}{\mathcal {G}_\varepsilon ( {\varvec{\mu }_{n}^{{}_{\left( j\right) }}} )}, \end{aligned}$$
    (20)
    $$\begin{aligned} \textbf{Q}_{n,\varepsilon }&= \frac{1}{{J}} \sum _{{j}=1}^{J}{\varvec{\mu }_{n}^{{}_{\left( j\right) }}} {\mathcal {G}_\varepsilon ({\varvec{\mu }_{n}^{{}_{\left( j\right) }}})}^\top - \,{\overline{\varvec{\mu }}_{n}} \overline{\mathcal {G}}_{{n},\varepsilon }^\top{} & {} \, \text {with} \,\,\,\,\,{\overline{\varvec{\mu }}_{n}} = \frac{1}{{J}} \sum _{{j}=1}^{J}{\varvec{\mu }_{n}^{{}_{\left( j\right) }}}. \end{aligned}$$
    (21)
  3. (iii)

    Analysis step. Update each particle in the ensemble: \(\text {for all } {j}\in \{1,\ldots ,{J}\}\)

    $$\begin{aligned} {\varvec{\gamma }_n^{{}_{(j)}}}&{\sim \mathcal {N} ( \overline{\varvec{\delta }}_\varepsilon , \varvec{\Sigma }+ \varvec{\Gamma }_\varepsilon ),} \end{aligned}$$
    (22)
    $$\begin{aligned} {{\varvec{\mu }_{n+1}^{{}_{\left( j\right) }}}}&{= {\varvec{\mu }_{n}^{{}_{\left( j\right) }}} + \textbf{Q}_{n,\varepsilon } \left( \textbf{P}_{n,\varepsilon } + \varvec{\Gamma }_\varepsilon + \varvec{\Sigma } \right) ^{-1} \left( \textbf{y}- {\mathcal {G}_\varepsilon ( {\varvec{\mu }_{n}^{{}_{\left( j\right) }}} )} - \varvec{\gamma }_n^{{}_{(j)}} \right) .} \end{aligned}$$
    (23)
  4. (iv)

    Termination step. Stop the algorithm when the termination criterion is satisfied:

    $$\begin{aligned} {{\Vert {\overline{\varvec{\mu }}_{n+1}}-{\overline{\varvec{\mu }}_{n}} \Vert _2} \le \tau {\Vert {\overline{\varvec{\mu }}_{n+1}} \Vert _2} \quad \text {with} \quad {\overline{\varvec{\mu }}_{n+1}} = \frac{1}{{J}} \sum _{{j}=1}^{J}{\varvec{\mu }_{n+1}^{{}_{\left( j\right) }}}. } \end{aligned}$$
    (24)

By employing the same training set for constructing the surrogate model and for evaluating \(\overline{\varvec{\delta }}_\varepsilon \) and \(\varvec{\Gamma }_\varepsilon \), we provide the largest possible training set to the model order reduction algorithm for a fixed value of S. However, we also introduce a bias in the estimation of the moments of \(\varvec{\delta }_\varepsilon (\mu ^\star )\) due to the underestimated values of \(\varvec{\delta }_{\varepsilon }(\varvec{\mu }^{{}_{(s)}})\). The bias could be removed, e.g., by partitioning the training set \(\{\varvec{\mu }_{0}^{{}_{(s)}}\}_{{}^{s=1}}^{{}_S}\) into two sub-sets (or by introducing two independent sets with cardinality S/2), one for the construction of the surrogate model and one for the independent estimation of \(\overline{\varvec{\delta }}_\varepsilon \) and \(\varvec{\Gamma }_\varepsilon \). The disadvantage of this approach (for fixed S) would be a a smaller training set for the surrogate model construction, and a poorer yet unbiased statistics for the moments’ estimation.

We note from the offline part in Algorithm 2 in the data bias estimation that the RB approximation needs to approximate the full order model uniformly well globally, i.e., over the entire parameter domain. We can thus not replace the classical offline-online decomposition by an on-the-fly adaptation of the reduced basis (Donoghue and Yano 2022), which is beneficial if only local approximations, e.g. along the optimization path, are required.

In Algorithm 2, the prior probability \(\Pi _0 (\varvec{\mu })\) used for the estimation of the moments of \(\varvec{\delta }^\star _\varepsilon \) could be substituted at every iteration by an updated probability measure of \(\varvec{\mu }\). However, the computation of the updated probability measure might compromise the computational gain obtained with the use of reduced models. One possibility to address this shortcoming is to use a Gaussian process regression of the initial ensemble biases to estimate the moments of \(\varvec{\delta }^\star _\varepsilon \) with respect to the new probability measure of \(\varvec{\mu }\). The development and study of this strategy together with its effect on the accuracy and performances of the RB-EnKM will be investigated in future studies.

4 Numerical experiments

In the following section, we consider two data assimilation problems for the estimation of model parameters in pPDEs. The first problem involves a linear advection dispersion problem with unknown Péclet number. The corresponding model is linear in the observed state \(c(\mu )\), but it is non-linear in the parameter to estimate. The second problem concerns the transport of a contaminant in an unconfined aquifer with unknown hydraulic conductivity. It involves two coupled PDEs: a stationary non-linear equation which describes the pressure field induced by an external pumping force and a time-dependent linear equation describing the advection-dispersion of the contaminant in a medium whose properties depend non-linearly on the pressure field.

Both models describe 2D systems, and each exhibits ideal characteristics to test the proposed algorithms. The first, while leading to a non-linear inverse problem, is sufficiently simple to allow for a comparison between the adjusted and biased RB-EnKM and the reference full order EnKM. Moreover, its affine dependence on the parameter enables the use of error bounds for the efficient construction of the reduced space. The second problem, which is non-linear and non-affine in the six-dimensional parameter vector, is complex enough to serve as a non-trivial challenge for the proposed RB-EnKM algorithm, while the reference EnKM cannot even be tested due to the computational cost. From an a priori estimate, performing full order tests with the same statistical relevance as the reduced basis ones would have taken up to 20 days on our machine.

The two problems are presented in Sects. 4.1 and 4.2. We first introduce the pPDE, then present the full order discretization followed by the reduced basis approximation. The measurement operator is then introduced, and a first analysis of the inversion method is carried out. Finally, we study the impact of the ensemble size, of the experimental noise magnitude, and of the error of the reduced model on the reconstruction error of the EnKM. All the computations are performed using Python on a computer with 2.20 GHz Intel Core i7-8750 H processor and 32 GB of RAM.

4.1 Taylor–Green vortex problem

Let us consider the dispersion of a contaminant modeled by the 2D advection–diffusion equation with a Taylor–Green vortex velocity field (Kärcher et al. 2018). We introduce the spatial domain \({\Omega }= (-1, 1)^2\) with Dirichlet boundary \({ \Gamma _D :=(-1,1) \times \{ -1 \} }\) and Neumann boundary \({ \Gamma _N :={\partial {\Omega }}{\setminus } \Gamma _D }\), and the time domain \(\mathcal {I}:=\left( 0,T\right] \) with \(T=2.5\). We consider the problem of estimating the inverse of the Péclet number \(\mu = 1/\textrm{Pe}\) in the interval \(\mathcal {P}:=[1/50, 1/10]\). The governing pPDE is given by: find \( c(\mu ): {\Omega }\times \left( 0,T\right] \rightarrow \mathbb {R} \) such that

$$\begin{aligned} \left\{ \begin{aligned}&\partial _t c - \mu \Delta c + \varvec{\beta } \cdot \nabla c = 0, \qquad{} & {} \text{ in } \, {\Omega }&\times \, {\mathcal {I}},\\&\nabla c(\textbf{x},t;\mu )\cdot \textbf{n} = 0,{} & {} \text{ on } \, \Gamma _N&\times \, {\mathcal {I}},\\&c(\textbf{x},t;\mu )=0,{} & {} \text{ on } \, \Gamma _D&\times \, {\mathcal {I}},\\&c(\textbf{x},0;\mu )=c_0(\textbf{x};\mu ),{} & {} \text{ in } \, {\Omega }.&\end{aligned} \right. \end{aligned}$$
(25)

Here, the velocity field \(\varvec{\beta } :={(\sin (\pi x_1) \cos (\pi x_2)}\),\( {-\cos (\pi x_1) \sin (\pi x_2))^\top }\), \(\textbf{x} = (x_1, x_2)\), is a solenoidal field, and the initial condition \(c_0(\mu ): {\Omega }\rightarrow \mathbb {R}\) is given by the sum of three Wendland functions \(\psi _{2,1}\) (Wendland 1995) of radius 0.4 and centers located at \((-0.6, -0.6)\), (0, 0), and (0.6, 0.6). The velocity field and the initial condition are shown in Fig. 1.

Fig. 1
figure 1

Spatial domain of the Taylor–Green problem. On the left: the initial condition \(c_0\) in blue, the sensor shape functions \(\eta _i\), and the Neumann and Dirichlet boundaries, \(\Gamma _N\), \(\Gamma _D\). On the right, the velocity field, \(\varvec{\beta }\), with four Taylor–Green vortices

The full order model is obtained by a nodal finite element discretization of (25) using piecewise continuous polynomial functions, \(\zeta _i: {\Omega }\rightarrow \mathbb {R}\), \(i = 1, \ldots , N_h\), of degree 2 over a uniform Cartesian grid of width \(h=0.04\), for a total of \(N_h = 10,100\) degrees of freedom. The resulting system of ordinary differential equations is integrated over time using a Crank–Nicolson scheme with uniform time step \(\Delta t = 0.01\). As shown in (Thomée 2006, Chapter 12), this is equivalent to performing a Petrov–Galerkin projection of (25) with trial and test spaces defined as follows: we consider the partition of the temporal interval \(\mathcal {I}\) into the union of equispaced subintervals, \(\mathcal {I}_n :=\left( t_{n-1}, t_n \right] \), of length \(\Delta t\) with \(n=1,\ldots , N_t\) and \(N_t :=T / \Delta t\). Let \(\omega _n:\mathcal {I} \rightarrow \mathbb {R}\) be a piecewise constant function with support in \(\mathcal {I}_n\), and let \(\upsilon _n:\mathcal {I} \rightarrow \mathbb {R}\) be a hat function with support in \(\mathcal {I}_{n} \cup \mathcal {I}_{n+1}\). We define the trial space \(\mathcal {V}_h :=\text {span} \{ \upsilon _n \cdot \zeta _i \}_{ {}^{i,n=1} }^{ {}_{N_h, N_t} }\) and the test space \(\mathcal {W}_h :=\text {span} \{ \omega _n \cdot \zeta _i \}_{ {}^{i,n=1} }^{ {}_{N_h, N_t} }\), respectively.

To solve the spatial problems arising at each time step, we use the sparse \(\texttt {splu}\) function implemented in the scipy.sparse.linalgFootnote 1 package. The computational time to obtain a single full order solution is on average 0.56s. Snapshots of the solution at times \(t \in \{0.2, 0.8, 1.4, 2.0\}\) and for the three parameter values \(\mu \in \{1/10, 1/30, 1/50 \}\) are shown in Fig. 2.

Fig. 2
figure 2

Solution of the advection–diffusion equation for three increasing values of \(\textrm{Pe}\) at four time instances t. Snapshots normalized to unitary \(L^\infty ({\Omega })\) norm

The high-fidelity model is used in combination with the time-gradient error bound \(\Delta ^{pr}_\text {R}(\mu )\) introduced in Aretz (2021) to implement a Weak-POD-Greedy algorithm for the selection of the reduced basis functions. To this end, we consider the training set \(\Xi _{\text {TRAIN}}^\mu \) with parameters \(\mu ^{{}_{(s)}}= 1/(9.5 + 0.5 s)\) for all \(s\in \mathbb {N}\cap [1,S]\) of size \(S=81\). We prescribe a target accuracy of \(10^{-2}\) for the maximum time-gradient relative error bound and we obtain an RB space of size 42. We can construct surrogate models of different accuracy by selecting \(N_\varepsilon \in \mathbb {N}\) basis functions \(\psi _i: {\Omega }\rightarrow \mathbb {R}\), for \(i=1,\ldots ,N_{\varepsilon }\), out of these 42. Each choice corresponds to a relative error for the model given by

$$\begin{aligned} \varepsilon _c :=\sup _{\mu \in \mathcal {D}} \frac{\Vert c_h(\mu ) - c_\varepsilon (\mu ) \Vert _{L^2(\mathcal {I},H^{1}({\Omega }))} }{ \Vert c_h(\mu ) \Vert _{L^2(\mathcal {I},H^{1}({\Omega }))} }. \end{aligned}$$
(26)

Once the reduced basis has been computed, we construct a reduced model via a Petrov–Galerkin projection of (25) in the same way as we did for the full order model. For this purpose, we define the trial space \(\mathcal {V}_\varepsilon :=\text {span} \{ \upsilon _n {\otimes } \psi _i \}_{ {}^{i,n=1} }^{ {}_{N_\varepsilon , N_t} }\) and the test space \(\mathcal {W}_\varepsilon :=\text {span} \{ \omega _n {\otimes } \psi _i \}_{ {}^{i,n=1} }^{ {}_{N_\varepsilon , N_t} }\).

We then look for a reduced solution of the form

$$\begin{aligned} { c_{\varepsilon }(\mu )= \sum _{i=1}^{N_{\varepsilon }}\sum _{n=1}^{N_t} c_{n,i}(\mu ) \, \upsilon _n \, \psi _i,} \end{aligned}$$
(27)

where the expansion coefficients \(c_{0,i}\), for \(i=1,\ldots ,N_\varepsilon \), result from the projection of the initial condition onto \(\mathcal {V}_\varepsilon \), while the remaining coefficients \(c_{n,i}\), with \(i=1,\ldots ,N_{\varepsilon }\) and \(n=1,\ldots ,N_t\), satisfy the equation

$$\begin{aligned} \sum _{j=1}^{ N_\varepsilon } \left( \textbf{M}_{ij} + \frac{\Delta t}{2} ( \textbf{A}_{ij} + \mu \textbf{K}_{ij}) \right) {c_{n,j}} = \sum _{j=1}^{ N_\varepsilon } \left( \textbf{M}_{ij} - \frac{\Delta t}{2} ( \textbf{A}_{ij} + \mu \textbf{K}_{ij}) \right) {c_{n-1,j}}. \end{aligned}$$
(28)

Here the matrices \(\textbf{M}, \textbf{K}, \textbf{A} \in \mathbb {R}^{\scriptscriptstyle N_\varepsilon \times N_\varepsilon }\) denote the mass, stiffness, and advection matrix, respectively, and are given by

$$\begin{aligned} \textbf{M}_{ij} :=\int _\Omega \psi _j \psi _i \, d\Omega ,\,\,\, \textbf{K}_{ij} :=\int _\Omega \nabla \psi _j \cdot \nabla \psi _i \, d\Omega ,\,\,\, \textbf{A}_{ij} :=\int _\Omega (\varvec{\beta } \cdot \nabla \psi _j) \psi _i \, d\Omega .\nonumber \\ \end{aligned}$$
(29)

The solution of the system of equations (28), equivalent to a Crank–Nicolson scheme, can be obtained iteratively solving \(N_t\) linear systems of size \(N_\varepsilon \) for an online complexity \(\mathcal {O}(N_\varepsilon ^3 + N_t N_\varepsilon ^2)\). This complexity can be achieved due to the time-independence of the pPDE by performing the LU factorization of the left-hand side before entering the time integration loop. Employing all \(N_\varepsilon = 42\) basis functions, the computational time for a reduced basis solution (online cost) is on average 5.4ms, significantly less than the approximately 0.56s required for a full order solution. The acceleration achieved is over 100, which justifies the 47s necessary for the construction of the RB model (offline cost), considering that the online phase requires computing up to 150 reduced basis solutions per iteration. Let us remark that such a cheap training phase is due to the low-dimensionality of the parameters space \(\mathcal {P}\) and the availability of a tight error bound for this class of linear problems.

Note that both the online computational cost and the accuracy of the solution depend on \(\Delta t\) and on \(N_\varepsilon \). The first is kept fixed, \(\Delta t = 0.01\), while the latter varies in some of the experiments. In order to keep track of the error associated with different choices of \(N_\varepsilon \), we proceed with the characterization of the error between the surrogate model solution \(c_\varepsilon (\mu )\) and the full model solution \(c_h (\mu )\) for different values of \(N_\varepsilon \). This analysis is provided in Fig. 3, depicting the maximum relative errors in \(L^2(\mathcal {I}, H^1(\Omega ))\), \(L^\infty (\mathcal {I}, L^\infty (\Omega ))\) and the time-gradient norm versus the reduced basis size. It shows a nearly exponential error decay as \(N_\varepsilon \) increases. The maxima are computed on an independent test set, \(\Xi _\text {TEST}^\mu :=\{ 1/(9.75 + 0.5 s)\), for all \( s \in \mathbb {N} \cap [1,80] \}\). Furthermore, the reduced solutions do not appear to deviate significantly from the projection of their full order counterparts onto the associated RB space, and the error bound employed demonstrates a good effectivity.

Fig. 3
figure 3

Left: maximum relative time-gradient error and error bound of the advection–diffusion solution versus \(N_\varepsilon \). Center: maximum \(L^2(\mathcal {I}, H^1(\Omega ))\) relative error of the projection and of the solution versus \(N_\varepsilon \). Right: maximum \(L^\infty (\mathcal {I}, L^\infty (\Omega ))\) relative error of the projection and of the solution versus \(N_\varepsilon \). Projections based on the \(L^2(\Omega )\) inner product of the gradients

For the implementation of the EnKM as presented in Sect. 2.1, it is necessary to provide a mathematical model for the measurement process. We take 40 measurements in time at the three sensor locations, \(\eta _i\), \(i \in \{1,2,3\}\), shown in Fig. 1. For this purpose, we introduce the measurement operator \(\mathcal {L}: L^2(\mathcal {I}, L^2(\Omega )) \rightarrow \mathbb {R}^{120}\), which can be seen as a vector of linear functionals \(\ell _k: L^2(\mathcal {I}, L^2(\Omega )) \rightarrow \mathbb {R}\) for all \(k \in \mathbb {N}\cap [1, 120]\). Each of those linear functionals has a unique Riesz representer \(\rho _k: \mathcal {I} \times \Omega \rightarrow \mathbb {R}\), with respect to the \(L^2(\mathcal {I}, L^2(\Omega ))\) norm, that can be written as

$$\begin{aligned} \rho _k = \nu _j \cdot \eta _i\quad \text{ with }\quad k=3j+i\quad \text{ for } \text{ all }\, j\in \mathbb {N}\cap [1,40], \, i \in \mathbb {N}\cap [1,3], \end{aligned}$$

where the spatial fields \(\eta _i: {\Omega }\rightarrow \mathbb {R}\) are Wendland functions \(\psi _{2,1}\) of radius 0.1 and center coordinates \((x_i,y_i) \in \{(0.1,0.7), (-0.1,-0.5), (0.5,0.1)\}\) (see Fig. 1), while, for each \(j\in \mathbb {N} \cap [1, 40]\), \(\nu _j: \mathcal {I} \rightarrow \mathbb {R}\) is a piecewise linear function supported over the interval \(\mathcal {I}_j :=[t_j-2\Delta t, t_j+2\Delta t]\), where \(t_j :=\Delta t (33+5j)\); \(\nu _j\) is assumed to be symmetric with respect to \(t_j\) and constant between \(t_j-\Delta t\) and \(t_j+\Delta t\).

Given this description of the observation process and the surrogate model, we next test the data assimilation scheme. We start with the estimation of the unknown parameter \(\mu ^\star = 0.04\) given the experimental measurements \(\textbf{y}(\mu ^\star , \varvec{\eta }) \in \mathbb {R}^{120}\), with noise \(\varvec{\eta } \sim \mathcal {N}(\textbf{0},\,\Sigma )\). We compare the performances of the EnKM employing a full order model and a surrogate model of accuracy \(\varepsilon _c = 10^{-3}\) with \(N_{\varepsilon }=42\). In order to obtain reliable statistics, we consider 25 ensembles \(\mathcal {E}_0\) of size \(J=150\) with particles sampled from the uniform prior distribution, \(\Pi _0(\mu ) = U(0.02, 0.10)\). The results obtained for a fixed value of \(\sigma ^2 = 10^{-6}\), at different iterations of the algorithm, are shown in Table 1. We observe a quick stabilization of the error means \(H_h\), \(H_\varepsilon \) and \(H_\varepsilon ^*\), and of the error covariances, \(S_h\), \(S_\varepsilon \) and \(S_\varepsilon ^*\), after just a few steps. The full order algorithm performs significantly better than the biased reduced basis algorithm, while the adjusted version of the algorithm exhibits an excellent performance, very close to the full order one.

The comparison of the ensemble standard deviation, reported in Table 2, with the average error, reported in Table 1, shows a positive correlation between the two quantities in the reference and the adjusted case. Contrarily, it is clear a decorrelation between the two quantities in the biased case as the iteration index increases. From this observation, we infer that, at least in this case, the ensemble covariance can be used as an error indicator when the reference or the adjusted algorithm is employed.

Table 1 Comparison of reference FE \((\,\cdot _h)\)—biased RB \((\,\cdot _\varepsilon )\)—adjusted RB \((\,\cdot _\varepsilon ^*)\) EnKM in low-noise conditions \(\sigma ^2=10^{-6}\). The test was performed by averaging 25 estimations obtained employing ensembles of 150 particles and using reduced basis models of size \(N_\varepsilon = 42\) (\(\varepsilon _c \approx 0.001\)). H refers to the mean of the estimation error, while S denotes the standard deviation of the estimation error. t.c. and o.c. indicate the total and online cost of one parameter estimation, respectively
Table 2 Same experimental conditions as in Table 1. E refers to the ensemble mean, \(\Sigma \) to the ensemble standard deviation. Both quantities are computed as the average over 25 ensembles

We next investigate the sensitivity of the algorithm with respect to the accuracy of the reduced model, to the effect of the ensemble size, and to the noise magnitude. First, we repeat the estimation of the reference parameter \(\mu ^\star = 0.04\) for different values of the ensemble size \(J = 4 k\), with \(k \in \mathbb {N}\cap [1, 10]\). In this experiment, we employ the same surrogate model used before and consider the relative noise magnitude \(\sigma / \Vert \mathcal {G}({\varvec{\mu }^\star })\Vert _\infty = 10^{-3}\). The results, shown in Fig. 4, indicate a larger sensitivity to J for the full order algorithm than for the other two. It requires a larger number of particles before stabilizing on a large-ensemble asymptotic behavior (or mean-field behavior), while the reduced basis algorithms exhibit a much faster convergence, possibly as a consequence of a lower-dimensional state space. Among the three iterations considered, the first appears to be the most affected, while, as the algorithm converges, the ensemble size seems to become less relevant.

Fig. 4
figure 4

Relative error in the parameter estimation versus ensemble size J for fixed noise magnitude, \(\sigma = 10^{-3} \Vert \mathcal {G} ({\varvec{\mu }^\star })\Vert _\infty \). The standard full order EnKM is shown on the left, the biased RB-EnKM in the center, and the adjusted RB-EnKM on the right. The solid lines represent the average error over 64 ensembles, while the dashed lines correspond to the 10th and 90th percentiles

In a second experiment, we consider the same parameter estimation, but we let the relative noise \(\sigma /\Vert \mathcal {G}({\varvec{\mu }^\star })\Vert _\infty \) take values \(10^{-i}\) for \(i\in \mathbb {N}\cap [2,6]\). Moreover, we employ \(J=40\) particles per ensemble and the same reduced basis model as before. Each estimation is replicated 64 times for different noise realizations. The results are shown in Fig. 5: for the full order EnKM we observe a linear dependence between the reconstruction error and the experimental noise, while the results for the biased RB-EnKM show that an untreated model bias introduces a systematic error independent of the noise magnitude. The most important result is the one related to the adjusted RB-EnKM: the error behavior achieved with this algorithm is comparable with the one obtained using a full order model. This demonstrates the effectiveness of the proposed method in compensating for the bias introduced by the reduced basis model, at least in this case study.

Fig. 5
figure 5

Relative error in the parameter estimation versus relative noise magnitude \(\sigma /\Vert \mathcal {G} ({\varvec{\mu }^\star })\Vert _\infty \) for fixed ensemble size \(J = 40\). The standard full order EnKM is shown on the left, the biased RB-EnKM in the center, and the adjusted RB-EnKM on the right. The solid lines represent the average error over 64 ensembles, while the dashed lines correspond to the 10th and 90th percentiles

This conclusion is further confirmed by the last experiment, in which the performances of the biased and adjusted RB-EnKM are tested for all the parameters in the test set \(\Xi _{\text {TEST}}^{\varvec{\mu }}\) already employed to test the reduced basis model. Each parameter in the set is estimated using surrogate models of increasing sizes. Each estimation is performed 64 times in very low-noise conditions, that is \(\sigma /\Vert \mathcal {G}({\varvec{\mu }^\star })\Vert _\infty = 10^{-5}\), employing \(J=40\) particles per ensemble. For each surrogate model employed, the results from the 64 ensembles are averaged and the maximum over the test set is computed. The results, shown in Fig. 6, demonstrate the ability of the correction to compensate for the presence of a model bias very well. As a consequence, the worst-case reconstruction error for the adjusted RB-EnKM barely depends on the reduced model size and it is always significantly lower than its biased counterpart. These results confirm the good performance of the adjusted RB-EnKM and its superiority over the biased RB-EnKM.

Fig. 6
figure 6

Parameter error versus reduced basis size for the biased and the adjusted RB-EnKM

4.2 Tracer transport problem

We now consider the tracer transport problem from Conrad et al. (2018), describing the non-homogeneous and non-isotropic transport of a non-reactive tracer in an unconfined aquifer. We introduce the spatial domain \({\Omega }:=(0, 1)^2\) divided into six sub-regions \({\Omega }= \bigcup _{r=1}^6 {\Omega }_r\) illustrated in Fig. 7 and defined as follows: \((x,y) \in {\Omega }\) is in \({\Omega }_r\) if the subscript r is the smallest integer for which \(x_0^r< x < x_1^r\) and \(y_0^r< y < y_1^r\) where the points \(\{(x_0^r,y_0^r)\}_{r=1}^6\) and \(\{(x_1^r,y_1^r)\}_{r=1}^6\) are defined in Table 3. We denote by \(\partial {\Omega }\) the outer boundary of the domain and define the parallel walls \(\Gamma _D :=(0,1)\times \{0,1\}\) and \(\Gamma _N :=\partial {\Omega }{\setminus } \Gamma _D\). Based on this partition, we define the conductivity field as the piecewise constant function \(k(\varvec{\mu }): {\Omega }\rightarrow \mathbb {R}\) over the six sub-regions \({\Omega }_r\). The conductivity can be affinely decomposed employing the coefficient vector \(\varvec{\mu }\in \mathbb {R}^6\), with components \(\mu _r\), and the indicator functions \(\eta _r: {\Omega }\rightarrow \mathbb {R}\)

$$\begin{aligned} \begin{aligned} k(\textbf{x};\varvec{\mu }) = \sum _{r=1}^6 e^{\mu _r} \eta _r(\textbf{x}) \qquad \text {with} \,\, \eta _r(\textbf{x}) = {\left\{ \begin{array}{ll} 1 \,\, \text{ if } \quad \textbf{x}\in \Omega _r, \\ 0 \,\, \text{ if } \quad \textbf{x}\in \Omega {\setminus }\Omega _r. \end{array}\right. } \end{aligned} \end{aligned}$$
(30)

We can now estimate the hydraulic log-conductivity \(\varvec{\mu }\), restricted to the orthotope , relying on measurements of the tracer concentration \(c(\varvec{\mu })\) collected over the time interval \(\mathcal {I}:=\left( 0,T\right] \), with \(T=0.5\). This field satisfies the pPDE: find \(c(\varvec{\mu }): {\Omega }\times \mathcal {I} \rightarrow \mathbb {R}\) such that

$$\begin{aligned} \left\{ \begin{aligned}&{\partial _t c - \nabla \cdot ( ( d_m \textbf{I} + d_l \varvec{\beta } \varvec{\beta }^\top (\varvec{\mu }) ) \nabla c ) + \varvec{\beta }(\varvec{\mu }) \cdot \nabla c = f_c, \qquad }{} & {} {\text{ in } \, {\Omega }\times \, \mathcal {I},}\\&\nabla c(\textbf{x},t;\varvec{\mu })\cdot \textbf{n} = 0,{} & {} {\text{ on } \, \partial {\Omega }\times \, \mathcal {I} ,}\\&c(\textbf{x},0;\varvec{\mu }) = 0,{} & {} \text{ in } \, {\Omega }. \end{aligned} \right. \end{aligned}$$
(31)

In this equation, the dispersion coefficients \(d_l=d_m=2.5\cdot 10^{-3}\) correspond to the flow-dependent component of the dispersion tensor and to its residual component, respectively. The forcing term \(f_c\) is assumed to be of the form \(f_c :=\sum _{{}^{i=1}}^{{}_4} f_{c,i}\) and it models the injection of different amounts of tracer in four wells located at \((a_i, b_i)\in \{ 0.15, 0.85 \}^2\); each \(f_{c,i}\) is a Gaussian function centered in \((a_i,b_i)\), with covariance \(\Gamma _c = 0.005\) and multiplicative coefficient \(p_i\) where \((p_1,p_2,p_3,p_4)=(10, 5, 10, 5)\). The velocity field \(\varvec{\beta } (\varvec{\mu }): {\Omega }\rightarrow \mathbb {R}^2\) is linearly dependent on the hydraulic head \(u(\varvec{\mu }): {\Omega }\rightarrow \mathbb {R}\) through the relation \(\varvec{\beta }(\varvec{\mu }) = -k(\varvec{\mu }) \nabla u\). The latter field must satisfy the second constraint of the inverse problem, i.e., under the Dupuit–Forchheimer approximation (Delleur 2016) it solves the non-linear elliptic pPDE: find \(u(\varvec{\mu }): {\Omega }\rightarrow \mathbb {R}\) such that

$$\begin{aligned} \left\{ \begin{aligned}&\nabla \cdot (k(\varvec{\mu }) u \nabla u) + f_u = 0, \qquad{} & {} \text{ in } \, {\Omega },\\&\nabla u(\textbf{x};\varvec{\mu }) \cdot \textbf{n} = 0,{} & {} \text{ on } \, \Gamma _N ,\\&u(\textbf{x};\varvec{\mu })=0,{} & {} \text{ on } \, \Gamma _D . \end{aligned} \right. \end{aligned}$$
(32)

Here, the forcing term \(f_u :=\sum _{i=1}^{4} f_{u,i}\) models an active pumping action at the four wells, each \(f_{u,i}\) is a Gaussian function centered in \((a_i, b_i)\), of covariance \(\Gamma _u = 0.02\) and coefficient \(q_i\), where \((q_1,q_2,q_3,q_4)=(10, 50, 150, 50)\). Due to the combination of the quadratic dependence in u and the zero boundary conditions, the equation always admits pairs of opposite solutions \(u^{+}, u^{-}\). However, in our study, we are only interested in the positive solution \(u^{+}(\varvec{\mu }): {\Omega }\rightarrow {\mathbb {R}^+}\).

Fig. 7
figure 7

Domain of the tracer transport problem and injection wells

Fig. 8
figure 8

Reference solutions at log-conductivity \({{\varvec{\mu }^\star }}:= [-0.75, -0.25, -0.5, \,1, -0.25, \,3]\). On the left: hydraulic head \(u_h({\varvec{\mu }^\star })\) and corresponding velocity field \(\varvec{\beta }_h:=- k({\varvec{\mu }^\star }) \nabla u_h\) (in red). On the right: tracer concentration \(c_h({\varvec{\mu }^\star })\), at time \(t=0.4\), and measurement wells (in red)

Table 3 On the left: coordinates of the corners of the sub-regions \(\Omega _r\). On the right: true values of the parameters \(\mu _r\) and boundaries of the uniform prior \(\Pi _0\)

Full order solutions are obtained via a finite element approximation, employing piecewise linear functions, \(\zeta _i: {\Omega }\rightarrow \mathbb {R}\), for \(i = 1, \ldots , N_h\), with \(N_{h} = 44,972\) degrees of freedom (mesh size \(h \approx 0.01\)). The discretization of the elliptic equation (32) results in a discrete non-linear problem which is iteratively solved employing a Newton scheme with tolerance \(10^{-6}\). The approximate solution, \(u_{h}\), is used to compute the velocity field, \(\varvec{\beta }_h(\varvec{\mu }):=- k(\varvec{\mu }) \nabla u_{h}\), which is piecewise constant with \(N_{h} - 1\) degrees of freedom. This is needed for the solution of the parabolic equation (31), whose discretization leads to a system of ordinary differential equations integrated over the time interval \(\mathcal {I}\) using the Crank–Nicolson scheme with uniform time step \(\Delta t = 0.01\). This is equivalent to performing a Petrov–Galerkin projection of Equation (31), analogously to what have been shown for Equation (25) in Sect. 4.1.

Each full order simulation is obtained employing a FreeFEM++ solver (Hecht 2012) and takes roughly 2 min to be computed. Figure 8 shows the hydraulic head \(u_h({\varvec{\mu }^\star })\) and the relative velocity field \(\varvec{\beta }_h({\varvec{\mu }^\star })\) (on the left) and the tracer concentration field \(c_h(0.4;{\varvec{\mu }^\star })\) (on the right), both associated with the reference log-conductivity

$$\begin{aligned} {\varvec{\mu }^\star }= [-0.75, -0.25, -0.50, 1.00, -0.25, 3.00]^\top . \end{aligned}$$
(33)

The same reference log-conductivity is used as the true parameter for the data assimilation problem. Pointwise observations are collected at five successive times \(t_m \in \{0.1, 0.2, 0.3, 0.4, 0.5\}\), in 25 spatial location \(\textbf{x}_{ij} = (x_i, y_j)\) such that \(x_i=0.1+0.2 i\) and \(y_j=0.1+0.2 j\) for \(i,j \in \{0, \ldots ,4\}\). This operation is encoded in the measurement operator \(\mathcal {L}: H^1({\Omega }) \rightarrow \mathbb {R}^{125}\). Each noise-free measurement is polluted with i.i.d. Gaussian noise with mean zero and covariance \(\sigma \), resulting in a noise covariance matrix \(\Sigma = \sigma ^2 \textbf{I}\).

In order to solve the inverse problem with surrogate models of different accuracy, various approximations of (32) and (31) must be produced. This requires the introduction of spatial basis functions \(\psi _i, \varphi _j: {\Omega }\rightarrow \mathbb {R}\), \(i \in \mathbb {N}\cap [1, N_\varepsilon ]\), \(j \in \mathbb {N}\cap [1, M_\varepsilon ]\), selected by applying the method of snapshots (POD) to the two sets of full order solutions,

$$\begin{aligned}\Theta ^u_\text {TRAIN} :=\{ u_h (\varvec{\mu }^{{}_{(s)}}) \}_{{}^{s=1}}^{{}_{S}}\quad \text {and}\quad \Theta _\text {TRAIN}^{c} :=\{ c_h(t^{\scriptscriptstyle (z)};\varvec{\mu }^{{}_{(s)}}) \}_{{}^{z,s=1}}^{{}_{Z,S}},\end{aligned}$$

with snapshot parameters, , for all \(s \in \mathbb {N}\cap [1, S]\) and sampling times \(t^{\scriptscriptstyle (z)} = 0.01 z\) for all \(z \in \mathbb {N}\cap [1, Z]\), where \(S=2,000\) and \(Z=50\). The number of basis functions considered, \(N_\varepsilon , M_\varepsilon \in \mathbb {N}\), is the one required to approximate the hydraulic head and the tracer concentration with relative accuracy \(\varepsilon _u, \varepsilon _c \in {\mathbb {R}^+}\), where

$$\begin{aligned} \varepsilon _u&:=\sup _{\varvec{\mu }\in \mathcal {D}} \frac{\Vert u_h(\varvec{\mu }) - u_\varepsilon (\varvec{\mu }) \Vert _{H^1({\Omega })}}{\Vert u_h(\varvec{\mu })\Vert _{H^1({\Omega })}}, \end{aligned}$$
(34)
$$\begin{aligned} \varepsilon _c&:=\sup _{\varvec{\mu }\in \mathcal {D}} \frac{\Vert c_h(\varvec{\mu }) - c_\varepsilon (\varvec{\mu }) \Vert _{L^2(\mathcal {I}, H^1({\Omega }))}}{\Vert c_h(\varvec{\mu })\Vert _{L^2(\mathcal {I}, H^1({\Omega }))}}. \end{aligned}$$
(35)

Based on the first set of basis functions, the approximation space for the Galerkin projection of (32) is defined as \(\mathcal {U}_\varepsilon :=\text {span} \{ \psi _i \}_{{}^{i=1}}^{{}_{N_\varepsilon }}\). From the second set of basis functions, instead, the RB test space \(\mathcal {W}_\varepsilon :=\text {span} \{ \omega _n {\otimes } \varphi _i \}_{ {}^{i,n=1} }^{ {}_{M_\varepsilon , N_t} }\) and RB trail space \(\mathcal {V}_\varepsilon :=\text {span} \{ \upsilon _n {\otimes } \varphi _i \}_{ {}^{i,n=1} }^{ {}_{M_\varepsilon , N_t} }\) are defined for the Petrov–Galerkin projection of (31). We look at reduced solutions of the form

$$\begin{aligned} {u_\varepsilon (\varvec{\mu })}&{= \sum _{i=1}^{N_\varepsilon } u_{i}(\varvec{\mu })\psi _i} \end{aligned}$$
(36)
$$\begin{aligned} {c_\varepsilon (\varvec{\mu })}&{= \sum _{j=1}^{M_\varepsilon } \sum _{n=1}^{N_t} c_{n,j}(\varvec{\mu }) \, \upsilon _n \, \varphi _j,} \end{aligned}$$
(37)

where the expansion coefficients \(c_{n,j}\) and \(u_i\), with \(i\in \mathbb {N}\cap [1, N_\varepsilon ]\), \(n \in \mathbb {N}\cap [1,N_t]\) and \(j \in \mathbb {N}\cap [1,M_\varepsilon ]\), satisfy the systems of algebraic equations

$$\begin{aligned} \sum _{p,q=1}^{N_\varepsilon , N_\varepsilon } \textbf{N}_{ipq} (\varvec{\mu }) u_p u_q&= f_i, \end{aligned}$$
(38)
$$\begin{aligned} \sum _{k=1}^{ M_\varepsilon } \left( \textbf{M}_{jk} + \frac{\Delta t}{2} \textbf{D}_{jk} (\textbf{u},\varvec{\mu }) \right) c_{n+1,k}&= \left( \textbf{M}_{jk} - \frac{\Delta t}{2} \textbf{D}_{jk} (\textbf{u},\varvec{\mu }) \right) c_{n,k} + g_j, \end{aligned}$$
(39)

given the initial conditions \(c_{0,j}=0\) for all \(j \in \mathbb {N}\cap [1, M_\varepsilon ]\). The scalar forcing terms \(f_i\), \(g_j\) are obtained by integrating their full order counterparts versus the basis functions \(\psi _i\) and \(\varphi _j\), for all \(i \in \mathbb {N}\cap [1, N_\varepsilon ]\), \(j \in \mathbb {N}\cap [1, M_\varepsilon ]\)

$$\begin{aligned} f_i :=\int _\Omega f_h \psi _i d \Omega , \qquad g_j :=\Delta t \int _\Omega f_c \varphi _j d \Omega . \end{aligned}$$
(40)

The mass and stiffness matrices \(\textbf{M}, \textbf{K} \in \mathbb {R}^{\scriptscriptstyle M_\varepsilon }\) are defined as in (29), while the parameter dependent tensors \(\textbf{D}(\textbf{u},\varvec{\mu }) \in \mathbb {R}^{\scriptscriptstyle M_\varepsilon ^2}\) and \(\textbf{N}(\varvec{\mu }) \in \mathbb {R}^{\scriptscriptstyle M_\varepsilon ^3}\) depend affinely on the multidimensional arrays \(\textbf{A}\in \mathbb {R}^{\scriptscriptstyle 6 \times N_\varepsilon ^3}\), \(\textbf{B}\in \mathbb {R}^{\scriptscriptstyle 6 \times N_\varepsilon ^2 \times M_\varepsilon ^2}\), and \(\textbf{C}\in \mathbb {R}^{\scriptscriptstyle 6 \times N_\varepsilon \times M_\varepsilon ^2}\) defined as

$$\begin{aligned} \textbf{A}_{ipqr}&:=\int _\Omega \frac{\eta _r}{2} \left( \psi _p (\nabla \psi _q \cdot \nabla \psi _i) + \psi _q (\nabla \psi _p \cdot \nabla \psi _i) \right) d\Omega , \end{aligned}$$
(41)
$$\begin{aligned} \textbf{B}_{jkpqr}&:=\int _\Omega \eta _r (\nabla \varphi _j \cdot \nabla \psi _p)(\nabla \varphi _k \cdot \nabla \psi _q) d\Omega , \end{aligned}$$
(42)
$$\begin{aligned} \textbf{C}_{jksr}&:=\int _\Omega \eta _r (\nabla \varphi _j \cdot \nabla \psi _s) \varphi _k d\Omega . \end{aligned}$$
(43)

For a fixed value of the log-conductivity, \(\varvec{\mu }\), the tensors \(\textbf{N} (\varvec{\mu }) \) and \(\textbf{D} (\textbf{u}, \varvec{\mu }) \) can be assembled. The latter, however, requires the evaluation of the discrete hydraulic head \(\textbf{u}\). They are respectively defined as

$$\begin{aligned} \textbf{N}_{ipq} (\varvec{\mu })&:=\sum _{r=1}^{6} e^{\mu _r} \textbf{A}_{ipqr} \,, \end{aligned}$$
(44)
$$\begin{aligned} \textbf{D}_{jk} (\textbf{u}, \varvec{\mu })&:=d_m \textbf{K}_{jk} + d_l \sum _{p, q, r=1}^{N_\varepsilon , N_\varepsilon , 6} e^{2\mu _r} \textbf{B}_{jkpqr} u_p u_q + \sum _{s,r=1}^{N_\varepsilon , 6} e^{\mu _r} \textbf{C}_{jksr} u_s \,. \end{aligned}$$
(45)

We emphasize that the accuracy of the solutions of (38) and (39), with the latter equivalent to a Crank–Nicolson discretization, depends on the number of basis functions and on the time step \(\Delta t\). In Fig. 9, on the left and on the right, we show the maximum relative errors of the surrogate model (\({\varepsilon }_u\), \({\varepsilon }_c\)) as a function of \(N_\varepsilon \) and \(M_\varepsilon \). In the center, we show the \(L^\infty ( \mathcal {I}; L^\infty ( {\Omega }))\) relative error of the tracer concentration, bounding from above the error on synthetic measurements. We compute these maximum relative errors on a set of parameters \( \Xi _\text {TEST}^{\varvec{\mu }} :=\{ \varvec{\mu }^{{}_{(s)}} \sim \Pi _0(\varvec{\mu }) \}_{s=1}^{500}\) independent of the ones used for the model training. It can be observed that, for small values of \(N_{\varepsilon }\), the error in the concentration stagnates after a certain value of \(M_{\varepsilon }\), suggesting that, in this region, the error is dominated by the approximation of the hydraulic head. However, for \(N_\varepsilon =40\), this effect is no longer present, at least for the values of \(M_\varepsilon \) considered, and the tracer error only depends on \(M_{\varepsilon }\). This allows us to modify the accuracy of the model by varying the dimension of the reduced model.

Fig. 9
figure 9

Left and center: maximum relative error of the solution and of the projection of the tracer concentration versus \(M_\varepsilon \) for different values of \(N_\varepsilon \); projection—in space—performed with respect to the \(H^1(\Omega )\) inner product. Right: maximum relative error of the projection and of the solution of the hydraulic head versus \(N_\varepsilon \). Error norm shown above each plot

The construction of the reduced model has an offline cost of about 75 h. This includes the time required for the construction of a training set of 2, 000 full order solutions \((61\text {h}\, 16'\, 40'')\), the time for the computation of the POD basis functions \((23' \, 40'')\), and the time for assembling the RB model tensors \((13\text {h}\, 32'\, 47'')\). This cost corresponds roughly to the computational cost of 2, 500 finite element solutions, each of which takes approximately 110s. The surrogate model obtained employing \(N_\varepsilon = 40\), \(M_\varepsilon = 320\) basis functions produces a solution in only 1.25s (online cost), which is about 1/90 of its full order equivalent. The same training set used for the POD is employed to estimate, at negligible cost, the empirical moments of \(\varvec{\delta }^\star _\varepsilon \), i.e., \(\overline{\varvec{\delta }}_\varepsilon \) and \(\varvec{\Gamma }_\varepsilon \).

Note that we admittedly used a “brute force” POD approach to generate the basis for this nonlinear problem. A more offline-efficient method, for example using a POD-Greedy algorithm, could have been used. However, this would have been more complex in terms of both theory and implementation of the reduced model, and is beyond the scope of this paper. Our focus here is on the EnKM and its modification in settings where surrogate models are used.

We now turn our attention to the inverse problem, as discussed in Sect. 2.1. We start by considering the estimation of the reference parameter \({\varvec{\mu }^\star }\) given the measurements \(\textbf{y}({\varvec{\mu }^\star }, \varvec{\eta }) \in \mathbb {R}^{125}\), polluted by experimental noise of magnitude \(\sigma \). To have a reliable statistic, we consider 32 independent initial ensembles \(\mathcal {E}_{0}\) of variable size, sampled from the same distribution \(\Pi _0\).

As a first experiment, we compare the performances of the two RB-EnKM employing \(J=160\) particles and a surrogate model with error tolerance \(\varepsilon _c \approx 0.02\) (obtained with \(N_\varepsilon = 40\) and \(M_\varepsilon = 320\)). The first test relies on the biased version of the RB-EnKM, as presented in Sect. 2.1, while the second test corresponds to the adjusted algorithm. For both simulations, we consider low-amplitude experimental noise, i.e., negligible if compared to the model error, \(\sup _{\varvec{\mu }} \Vert \mathcal {L} (c_h(\varvec{\mu }) - c_\varepsilon (\varvec{\mu })) \Vert _\infty \approx 10^{-2} > 10^{-3} = \sigma \), and we separately pollute the measurements employed by the ensembles. In Table 4, we report the average properties of the ensembles after 4 iterations: columns \(E_\varepsilon \) and \(E_\varepsilon ^*\) contain the mean parameter estimation, i.e., the particle mean, averaged over the 32 ensembles. Here, columns \(\Sigma _\varepsilon \) and \(\Sigma _\varepsilon ^*\) contain the average standard deviation of the ensembles. We can observe that the correction term has the effect of significantly lowering the reconstruction error from \(\Vert E_\varepsilon - {\varvec{\mu }^\star }\Vert _\infty = 6.437\)e-3 to \(\Vert E_\varepsilon ^* - {\varvec{\mu }^\star }\Vert _\infty = 7.870\)e-4. We also notice that the variability of the estimate increases consistently with the presence of an additional term in the Kalman gain. At least in this test, and contrary to the previous numerical experiment, the standard deviation of the ensemble shows a good correlation with the reconstruction error for both the biased and adjusted algorithm.

We note, for this example, that the total cost of estimating a single parameter with the biased or the adjusted RB-EnKM is higher than with the FO-EnKM (approx. \(75\text {h}\, 28'\) for the former and \(19\text {h}\, 37'\) for the latter). This is due to the six-dimensional parameter space and the fact that the parameters are not highly correlated. We thus require a fairly large training set of size \({2,\!000}\) to obtain a sufficiently accurate reduced order model over the whole parameter space. In combination with the “brute force” POD approach—as mentioned above—the offline cost is thus considerable. However, if one is interested in multiple parameter estimations, e.g., due to new data being availabe, the RB-EnKM algorithm significantly outperforms the FO-EnKM in terms of computational runtime since the online phase requires only \(13'\, 32''\). For example, repeating the parameter estimation 32 times in order to obtain a better statistical characterization of the method takes about \(82\text {h}\) using the reduced basis method, but it would require more than \(627\text {h}\) using the FO-EnKM.

Table 4 Comparison of biased RB \((\,\cdot _\varepsilon )\)—adjusted RB \((\,\cdot _\varepsilon ^*)\) EnKM in low-noise conditions \(\sigma = 10^{-6}\). The test was performed by averaging 32 estimations obtained employing ensembles of 160 particles and 4 iterations, and using reduced basis models of size \(N_\varepsilon =40\), \(M_\varepsilon =320\) (\(\varepsilon _c \approx 0.02\)). E refers to the average parameter estimation, while \(\Sigma \) denotes the average ensemble standard deviation, and H the average estimation error. t.c. and o.c indicate respectively the total and online cost of one parameter estimation

As an extension of the previous experiment, we estimate the reference parameter \({\varvec{\mu }^\star }\) employing the same surrogate model, noise magnitude and number of ensembles as before, but using ensembles of variable size \(J=20k\), with \(k \in \mathbb {N}\cap [2,16]\). This allows us to study the effect of the ensemble size on the parameter estimation obtained with the biased and adjusted RB-EnKM algorithms. The results shown in Fig. 10 indicate that, for both algorithms, very small ensembles lead to large relative errors and entail a large variability among the different samples. This behavior seems to be relevant only for ensembles with less than 40 particles when the biased RB-EnKM is employed, and with less than 80 particles when the adjusted version is used. Larger ensembles do not exhibit relevant fluctuations; we can therefore assume an ensemble of size \(J=160\) to be sufficiently large to ensure the independence from this quantity of the results in the upcoming tests.

Fig. 10
figure 10

Biased and adjusted RB-EnKM parameter estimation relative error versus ensemble size, for fixed noise magnitude \(\sigma = 10^{-3}\). The solid lines represent the average error over 32 ensembles at different algorithm iterations. The dashed lines represent the 10th and the 90th percentiles

A key quantity determining the performances of the method is the noise magnitude. Its effect on the two reduced basis algorithms is investigated by looking at the variation of the relative estimation error of the reference parameter \({\varvec{\mu }^\star }\) when the noise magnitude varies. To this end, we consider seven noise values, \(\sigma ^2 = 10^{-m}\) with \(m \in \mathbb {N}\cap [1, 7]\). We employ the same RB-EnKM used before, with a fixed ensemble size \(J=160\), and we average the results over 32 independent ensembles. The results, shown in Fig. 11, reiterate the inadequacy of the biased method in dealing with the systematic bias introduced in the measurements by the surrogate model. In fact, the plot corresponding to the biased method shows error stagnation for low-noise. On the contrary, the plot corresponding to the adjusted method highlights a mitigation of this effect, with an estimation error that keeps decreasing in low-noise conditions, although at a lower rate than in high-noise conditions.

Fig. 11
figure 11

Biased and adjusted RB-EnKM parameter estimation relative error versus absolute noise magnitude, for fixed ensemble size \(J=160\). The solid lines represent the average error over 32 ensembles at different algorithm iterations. The dashed lines represent the 10th and the 90th percentiles

In our last experiment, we test the performances of the biased and adjusted RB-EnKM by employing surrogate models of increasing accuracy. We fix the size of the reduced space \(\mathcal {U}_\varepsilon \) to a sufficiently large value, \(N_\varepsilon = 40\), and we vary the size of the approximation space associated with the concentration: \(M_\varepsilon =10k\), with \(k \in \mathbb {N}\cap [2, 32]\). Employing the resulting approximated models, we estimate the reference parameter \({\varvec{\mu }^\star }\) in low-noise conditions, \(\sigma = 10^{-3}\), averaging the results obtained over 16 ensembles of 160 particles each. In Fig. 12, we show the final relative error (after three algorithm iterations) as \(M_\varepsilon \) and \(\varepsilon _c\) change, both for the biased and the adjusted RB-EnKM. For both, we observe that the relative estimation error decreases, almost linearly, with the error of the surrogate model. Moreover, we observe that, with few exceptions, the error of the adjusted algorithm is smaller than the error of the biased algorithm. The few points where the two errors are very close can be explained by a strongly unbalanced distribution of the measurement bias in a region away from the reference parameter. Future developments that take into account, in the execution of the algorithm, the parameter estimate to adjust the bias correction should dampen this effect.

Fig. 12
figure 12

Parameter error versus reduced basis size and maximum relative error of the solution. The solid lines represent the average error over 16 ensembles at different algorithm iterations. The dashed lines represent the 10th and the 90th percentiles

5 Conclusions

We proposed an efficient, gradient-free iterative solution method for inverse problems that combines model order reduction techniques, via the reduced basis method, and the Kalman ensemble method introduced in Iglesias et al. (2013). The use of surrogate models allows a significant speed-up of the online computational cost, but it leads to a distortion in the cost function optimized by the inverse problem. This in turn introduces a systematic error in the approximate solution of the inverse problem. To overcome this limitation, we have proposed the adjusted RB-EnKM which corrects for this bias by systematically adjusting the cost function and thus retrieving good convergence.

Using a linear Taylor–Green vortex problem, the performance of the method is compared versus the full order model as well as to the biased RB-EnKM in which no adjustment was made. The numerical results show that the biased method fails to achieve the same accuracy as the full order method. Contrarily, the adjusted RB-EnKM attains the same accuracy as its full order counterpart for a large range of noise magnitudes at a significantly lower computational cost, and even approaches the mean-field limit faster as the ensemble size is increased. Furthermore, the dependence on model accuracy of the reconstruction error is essentially removed over the range of model accuracy considered.

The method was then applied to a non-linear tracer transport problem. The results for this example show that, despite a decrease in the order of convergence at low-noise, the stagnation of the reconstruction error observed in the biased RB-EnKM can be removed by adjusting the algorithm. Regarding the model accuracy, a substantial improvement of the adjusted EnKM with respect to the biased EnKM was observed, although less pronounced than in the linear problem.

Overall, our numerical tests show that the proposed method allows for the use of inexpensive surrogate models while empirically ensuring that the predicted result of the inversion remains accurate with respect to the full order inversion.

Although the online computational cost is significantly lower than the reference full order method, we do note that—depending on the problem at hand and the implementation—the offline cost can be considerable. As a result, the overall cost (offline plus online parameter inversion) for solving a single inversion problem may not be competitive with a plain full order inversion as observed in the second case study. However, if we consider the solution of multiple inverse problems, either due to new data being analyzed or in order to obtain a better statistics, the method becomes competitive also for the second numerical experiment considered.