A reduced basis ensemble Kalman method

In the process of reproducing the state dynamics of parameter dependent distributed systems, data from physical measurements can be incorporated into the mathematical model to reduce the parameter uncertainty and, consequently, improve the state prediction. Such a data assimilation process must deal with the data and model misfit arising from experimental noise as well as model inaccuracies and uncertainties. In this work, we focus on the ensemble Kalman method (EnKM), a particle-based iterative regularization method designed for a posteriori analysis of time series. The method is gradient free and, like the ensemble Kalman filter (EnKF), relies on a sample of parameters or particle ensemble to identify the state that better reproduces the physical observations, while preserving the physics of the system as described by the best knowledge model. We consider systems described by parameterized parabolic partial differential equations and employ model order reduction techniques to generate surrogate models of different accuracy with uncertain parameters. Their use in combination with the EnKM involves the introduction of the model bias which constitutes a new source of systematic error. To mitigate its impact, an algorithm adjustment is proposed accounting for a prior estimation of the bias in the data. The resulting RB-EnKM is tested in different conditions, including different ensemble sizes and increasing levels of experimental noise. The results are compared to those obtained with the standard EnKF and with the unadjusted algorithm.


Introduction
The problem of estimating model parameters of static and dynamical systems is encountered in many applications from earth sciences to engineering.In this work we focus on the parameter estimation of dynamical systems described by parameterized parabolic partial differential equations (pPDEs).Here, we assume that a limited and polluted knowledge of the solution is available at multiple time instances through noisy local measurements.
For solving this kind of inverse problem, countless deterministic and stochastic methods have been proposed.Among them, a widely used technique is the so-called ensemble Kalman filter [1], a recursive filter employing a series of measurements to obtain improved estimates of the variables involved in the process.The idea of using the EnKF for reconstructing the parameters of dynamical systems traces back to [2,3], in which trivial artificial dynamics for the parameters was assumed to make the estimation possible.This was naturally accompanied by efforts for improving the performance of the method in terms of stability, by introducing covariance inflation [4,5] and localization [4,6], and in terms of computational cost.Relevant to the latter have been the development of multi-level methods [7], the use of model order reduction techniques [8], and the introduction of further surrogate modeling techniques [9].The use of approximated models inevitably led to the study of the impact of model error on the EnKF [10,11], alongside with other data assimilation methods [12,13].
Although ensemble Kalman methods were originally meant for sequential data assimilation, i.e., for real-time applications, they proved to be reliable also for asynchronous data assimilation [14].The first paper proposing to adapt the EnKF to a retrospective data analysis was [15].For analysis, the data are employed all at once at the end of an assimilation window, which is in common with a series of methods, e.g., variational methods [16] such as 4D-VAR [17] and other smoothers [18].Compared to those approaches, the EnKF is particularly appealing since it does not require the computation of Fréchet derivatives, a major complication for data assimilation algorithms.
In [19], Iglesias et al. introduced what they called the ensemble Kalman method, an EnKF-based asynchronous data assimilation algorithm.Depending on the design of the algorithm, this method has connections to Bayesian data assimilation [20] and to maximum likelihood estimation [21].In particular, in the latter case, the method constitutes an ensemble-based implementation of so-called iterative regularization methods [22].In the case of perfect models, the EnKM has already been analyzed in depth in [20,23] and convergence and identifiability enhancements have been proposed in [24,25].Due to the iterative nature of the EnKM, dealing with high-dimensional parametric problems is often computationally challenging.In [26] a multi-level strategy has been proposed to improve the computational performance of the method.
In this work we propose an algorithm, called Reduced Basis Ensemble Kalman Method (RB-EnKM), that leverages the computational efficiency of surrogate models obtained with MOR techniques to solve asynchronous data assimilation problems via ensemble Kalman methods.The use of the EnKM allows us to avoid adjoint problems that are often difficult to reduce and intrinsically depend on the choice of measurement positions.Model order reduction, already employed in other data assimilation problems [27,28], is used as a key tool for accelerating the method.However, the use of approximate models within the EnKM introduces a model error that could hinder the convergence of the method.In this work, we propose to deal with this error by including a prior estimation of the bias in the data.Specifically, we incorporate empirical estimates of the mean and covariance of the bias in the Kalman gain.In some instances, those quantities can be computed at a negligible cost by employing the same training set used for the construction of the reduced model.
The paper is structured as follows: in Section 2 we introduce the asynchronous data assimilation problem together with the standard ensemble Kalman method (Algorithm 1).Subsequently, in Section 3.1, we present an overview on reduced basis (RB) methods and describe how to use them in combination with the ensemble Kalman method to derive the RB-EnKM (Algorithm 2).In Section 4, we test the new method on two numerical examples.In the first example, we estimate the diffusivity in a linear advectiondispersion problem in 2D (Section 4.1), while in the second, we estimate the hydraulic log-conductivity in a non-linear hydrological problem (Section 4.2).In both cases, we compare the behavior of the full order and reduced order models in different conditions.Section 5 provides conclusions and considerations on the proposed method and on its numerical performances.

Problem Formulation
Let U be a given function space and let P ⊂ R Np , with N p ∈ N + , be a set of model parameters.We consider the pPDE: for any parameter µ ∈ P, find u( Here F µ is a generic parameterized differential operator and ∂ t is the first order partial time derivative.This pPDE provides the constraint to the inverse problem of estimating the unknown parameter µ ∈ P from data or observations given by Here, L : U → R Nm , with N m ∈ N + , maps the space of the solutions to the space of the measurements, simulating the observation process, and η is an unknown realization of a Gaussian random variable with zero mean and given covariance, Σ ∈ R Nm×Nm .Note that both the observed data y and additive noise η are N m -dimensional vector-valued quantities and that Σ is a symmetric positive-definite matrix defining the inner product To solve this inverse problem, we must explicitly solve the pPDE (1).This is done using a suitable discretization, in space and time, of the differential operator F µ .To this end, we introduce an approximation space V h ⊂ U so that the approximate problem reads: The discretization of the pPDE can be chosen according to the specific problem of interest.In all numerical examples proposed in this work, we employ a spacetime Petrov-Galerkin discretization of (1) with piecewise polynomial trial and test spaces, as described in Section 4, and we assume (2) to be sufficiently accurate such that we can take y(µ , η) = Lu h (x, t; µ ) + η.
To characterize the observation of the solution, we introduce the forward response map G : P → R Nm defined as G(µ) := Lu h (x, t; µ) for any solution of the pPDE (2).Although the use of the map G results in a more compact notation, omitting its dependence on the solution of the pPDE conceals a key aspect of the method, i.e., the mapping from the parameter vector to the corresponding space-time pPDE solution.For this reason, and because it makes it harder to introduce the problem discretization, it will be used with caution.

The Ensemble Kalman Method
The data assimilation problem presented above can be recast as a minimization problem for the cost functional, Φ(µ | y) := y(µ , η) − Lu h (x, t; µ) 2 Σ −1 , representing the misfit between the experimental data, y(µ , η), and the forward response.The optimal parameter estimate µ opt (y) is thus given by This is equivalent to a maximum likelihood estimation, given the likelihood function, l(µ | y) = exp{− 1 2 Φ(µ | y)}, associated with the probability density function of the data, y|µ, i.e., the probability of observing y if µ is the parametric state.The shape of the function follows from the probability density function of the Gaussian noise realization.
Among various methods proposed to solve this optimization problem, the EnKM relies on a sequence of parameter ensembles E n , with n ∈ N + , to estimate the minimum of the cost functional.Each ensemble consists of a collection {µ (j)  n } J j=1 of J ∈ N + parameter vectors µ (j) n , hereby named ensemble members or particles, whose interaction, guided by the experimental measurements, causes them to cluster around the solution of the problem as iterations proceed.At the beginning of each iteration, the solution of the pPDE and its observations are computed for each j ∈ {1, . . ., J}.Subsequently, the ensemble is updated based on the empirical correlation among parameters and between parameters and measurements, as well as on the misfits between the experimental measurements y(µ , η) and the particle measurements Lu h (x, t; µ (j)  n ).A single iteration, equivalent to the one in [19], is formalized in the following pseudo algorithm: Algorithm 1 Iterative ensemble method for inverse problems.
Let E 0 be the initial ensemble with elements {µ (j)  0 } J j=1 sampled from a distribution Π 0 (µ).For n = 0, 1, . . .(i) Prediction step.Compute the measurements of the solution for each particle in the last updated ensemble: (ii) Intermediate step.From the last updated ensemble measurements and parameters, define the sample means and covariances: (iii) Analysis step.Update each particle in the ensemble: In the last step of the algorithm, the cross correlation matrices P n and Q n are used to compute the Kalman gain K n := Q n (P n + Σ) −1 .This modulates the extent of the correction: a low-gain corresponds to conservative behavior, i.e., small changes in the particle positions, while a high-gain involves a larger correction.Note that the experimental data are perturbed with artificial noise sampled from the same distribution assumed for the experimental noise η.This leads to an improved estimate over the unperturbed case.A termination criterion for the algorithm is essential for the proper implementation of the method.The one presented in [19] is based on the discrepancy principle and consists in stopping the algorithm when the error between the experimental data and the measurements is comparable to the experimental noise, that is, when An alternative approach is to set a threshold for the norm of the parameter update, i.e., to terminate the algorithm when µ n+1 − µ n 2 ≤ τ µ n+1 2 for some τ 1.The latter criterion is more robust to model errors and is therefore used in our numerical experiments.
Equally important for the method is the choice of the distribution Π 0 from which the initial ensemble (or first guess) E 0 is sampled.In most of the cases, including those considered in our numerical experiments, the distribution Π 0 comes from an a priori knowledge of the range of admissible parameters.In other scenarios, e.g., when the parameters live in an infinite-dimensional space, it may be necessary to define additional criteria on how to treat the parameter space.The initial ensemble plays a fundamental role in stabilizing the inverse problem.Indeed, it has been shown in [19] that all the ensembles generated by Algorithm 1 are contained in the space spanned by the initial ensemble, that is Furthermore, in the mean-field limit, i.e., in the case of infinite particles, and assuming an affine relationship between parameters and synthetic measurements, the distribution Π 0 plays the same role as the Tikhonov regularization in variational data assimilation, see [29].In particular, the stabilization term is given by − log e Π 0 (µ).
The main sources of error of the EnKM are associated with the ensemble size and with the evaluation of G(µ (j)  n ) = Lu h (x, t; µ (j) n ).Indeed, while the observation of the solution is accurate and computationally cheap to evaluate, due to the linearity of the operator, the accuracy in the computation of the pPDE solution intrinsically depends on the quality of the numerical discretization.High order numerical discretizations might require prohibitively large computational costs, especially if the pPDE (2) is solved for many values of the parameter and over long temporal intervals.
The other steps of Algorithm 1 involve the following operations: (i) the assembly of P n and Q n in (5)- (6), with computational complexity of order O(JN 2 m ), and (ii) the inversion of the matrix P n + Σ in the analysis step (7) with complexity O(N 3 m ).The solution of the pPDE (2), for all j ∈ {1, . . ., J}, in the prediction step of Algorithm 1 is thus the computational bottleneck of the EnKM algorithm.

Reduced Basis Methods
Given the need to solve the pPDE (2) for several instances of the parameter, the use of MOR techniques appears an ideal choice.Model order reduction has allowed exceptional computational speed-ups in settings that require repeated model evaluations, such as multi-query simulations.In MOR the high-dimensional problem is replaced with a surrogate model of reduced dimensionality that still possesses optimal or near-optimal approximation properties but that can be solved at a considerably reduced computational cost.In this work, we focus on a particular class of MOR techniques, known as reduced basis methods [30].
The reduced basis method typically consists of two phases: an offline phase and an online phase.In the computationally expensive offline phase a lowdimensional approximation of the solution space, namely the reduced space, is constructed and a surrogate model is derived via projection of the full order model onto the reduced space.Then, the resulting low-dimensional reduced model can be solved in the online phase for many instances of the parameter at a computational cost independent of the size of the full order model.
To be more precise, let µ u h (x, t; µ) for all µ ∈ P} be the solution set which collects the solution of the discretized pPDE (2) under variation of the parameter µ.The parametric problem ( 2) is said to be reducible if the solution set M can be well approximated by a low-dimensional linear subspace.In this case, such a subspace is obtained as the span of a problem-dependent basis derived from a collection of full order solutions or snapshots, {u h (µ s )} S s=1 , with S ∈ N + , at sampled values, {µ s } S s=1 , of the parameter.The set P TRAIN := {µ s } S s=1 ⊂ P of training parameters is a sufficiently rich subset of the parameter space that can be obtained by drawing random samples from a uniform distribution in P or with other sampling techniques, such as statistical methods and sparse grids, see [31,Chapter 6] and references therein.The extraction of the basis functions from the snapshots is usually performed using SVD-type algorithms such as the proper orthogonal decomposition (POD) [32] or greedy algorithms.For problems that depend on both time and parameters, the so-called POD-Greedy method [33,34] combines a greedy algorithm in parameter space with the proper orthogonal decomposition in time at a given parameter.In the numerical tests of this work, we rely on the Weak-POD-Greedy algorithm, which is the preferred method whenever a rigorous error bound can be derived, while the POD and the Strong-POD-Greedy are often used when a bound is unavailable, i.e., for most of non-linear problems.Note that, under the same choice of training parameters, the latter are more accurate but computationally less efficient.
Once an N ε -dimensional reduced basis {ψ i } Nε i=1 is obtained, the associated reduced space is given by V ε = span{ψ 1 , . . ., ψ Nε } ⊂ V h , and the full model solution u h (µ), for a given µ, is approximated with a function where (u 1 (µ), . . ., u Nε (µ)) ∈ R Nε denotes the vector of the expansion coefficients in the reduced basis.The reduced model thus reads: where the operator The computational gain derived from solving problem (9) instead of the full order model (2) hinges on the feasibility of a complete decoupling of the offline and online phases.A computational complexity of the online phase independent of the size of the full order problem can be achieved under the assumption of linearity and parameter-separability of the operator F h µ .To deal with general non-linear operators, hyper-reduction techniques are required.These include methods for approximating the high-dimensional non-linear term F h µ with an empirical affine decomposition, such as the EIM [35], and methods for reducing the cost of evaluating the non-linear term, such as linear program empirical quadrature [36] and empirical cubature [37].

A Reduced Basis Ensemble Kalman Method
In this section, we discuss the implications of replacing the high-fidelity model in the prediction step of the EnKM by a surrogate model derived via model order reduction, as described in Section 3.1.The use of MOR for particle-based methods is particularly desirable in multi-query contexts since it allows us to significantly reduce the computational cost of solving the inverse problem.However, the approximation introduced by the model order reduction inevitably produces (small) deviations of the reduced solution from the full order one.This constitutes a problem for data assimilation algorithms, as already documented and investigated in [12] and in other works.Indeed, the error in the solution results in discrepancies between approximated and exact measurements.Although we can expect the mismatch δ ε (µ) := Lu h (µ) − Lu ε (µ) to decrease with the approximation error of u ε (µ), this bias will inevitably entail a distortion of the loss functional obtained by simple model substitution, i.e.,

Φ(µ| y)
Note that this cost function does not vanish in the parameter µ we are trying to estimate, not even in noise free conditions.This systematic error, independent of the magnitude of the experimental noise, can be mitigated by modifying the cost function and consequently the EnKM.A modified algorithm, which we refer to as adjusted RB-EnKM is presented in the following sections.This algorithm is in contrast to what we refer to as the biased RB-EnKM, i.e., the algorithm obtained by the simple substitution of the full order model with the reduced order model in Algorithm 1, as presented in (10).
The modification of the algorithm can proceed in two ways.One possibility is to rewrite the exact cost function in terms of the surrogate model and the measurement bias, namely substituting A second option is to correct the experimental data involved in the biased cost function (10) so that, at least in noise free conditions, its minimum coincides with the minimum of the exact cost function.This means subtracting δ ε (µ ) instead of δ ε (µ), and results in the new cost function In noise free conditions, i.e., if η = 0, both cost functions vanish at the exact value µ .Since the cost functions Φ 1 and Φ 2 are non-negative, the minimum attained in µ is necessarily also a global minimum.
In the following, we focus on the second approach.The reason is that the first approach requires the evaluation of the bias at all parameter values µ ∈ P which is too expensive to perform.Furthermore, at the algorithmic level, the substitution of the true model with the sum of the surrogate model and its bias would significantly change the computation of P n and Q n , and thus the algorithm structure.By contrast, the second approach is based on the assumption that the true model is incorrect and on the subsequent correction of the experimental data.This implies that δ ε (µ ) is the only bias involved and it requires just a single full order evaluation.However, since the argument µ is unknown, this is clearly not possible, and we must instead exploit the prior epistemic uncertainty on µ , encoded in Π 0 (µ), to modify the cost function.
If µ is treated as a random variable with probability measure Π 0 , then the data bias δ ε = δ ε (µ ) is in turn a random variable with probability measure Π 0 • δ −1 ε .The moments of this distribution, henceforth denoted by δ ε and Γ ε , can be empirically estimated via pointwise evaluations of the bias without further assumptions on the nature of the distribution itself.However, the assumption of Gaussianity, although improperly implying the linearity of δ ε : P → R Nm , is consistent with the other assumptions of Gaussianity and linearity required for the derivation of the EnKF [1].Furthermore, it allows us to obtain closed-form results, as shown in the next paragraphs.
In view of the fact that δ ε is considered as a random variable, we change Equation (12) to make explicit the dependence of the cost function Φ 2 on δ ε ; thereby In order to make the estimate of µ dependent only on the experimental data, we must remove the conditioning on δ ε , i.e., marginalize out the random variable.The easiest way to do this is by employing Bayesian statistics and particularly recovering the same marginal distribution y|µ mentioned at the beginning of Section 2.1.To this end, we consider the likelihood function concerning the mean and covariance of the joint distribution of Gaussian variables, it can be easily Hence, by analogy with Section 2.1, we can adapt the EnKM to optimize the new cost function under the surrogate model constraint (9).The resulting adjusted RB-EnKM is summarized in Algorithm 2. Unlike the reference EnKM, we distinguish between an offline and an online phase.In the offline phase, the training set of full order solutions is generated and used both to construct the surrogate model and to estimate the moments of δ ε .In the online phase, the actual optimization is performed.
Algorithm 2 Iterative ensemble method with reduced basis surrogate models and accounting for the associated measurements bias.
Offline: Let P TRAIN be a set of S parameters {µ (s)  0 } S s=1 sampled from the distribution Π 0 (µ).Based on the associated full order and reduced solutions u h (µ (s) ) and uε(µ (s) ), respectively, we define the training biases and the associated empirical moments Online: Let E 0 be the initial ensemble with elements {µ (j) 0 } J j=1 sampled from the distribution Π 0 (µ).For n = 0, 1, . . .(i) Prediction step.Compute the biased measurements of the approximated solution for each particle in the last updated ensemble: (ii) Intermediate step.From the last updated ensemble measurements and parameters, define the sample means and covariances: Gε(µ (j) n ), ( 18) (iii) Analysis step.Update each particle in the ensemble: The prior probability Π 0 (µ) used for the estimation of the moments of δ ε could be substituted at every iteration by an updated probability measure of µ.However, the computation of the updated probability measure might compromise the computational gain obtained with the use of reduced models.One possibility to address this shortcoming is to use a Gaussian process regression of the initial ensemble biases to estimate the moments of δ ε with respect to the new probability measure of µ.The development and study of this strategy together with its effect on the accuracy and performances of the RB-EnKM will be investigated in future studies.

Numerical Experiments
In the following section, we consider two data assimilation problems for the estimation of model parameters in parabolic partial differential equations.The first problem involves a linear advection dispersion problem with unknown Péclet number.The corresponding model is linear in the observed state c(µ), but it is non-linear in the parameter to estimate map.The second problem concerns the transport of a contaminant in an unconfined aquifer with unknown hydraulic conductivity.It involves two coupled partial differential equations: a stationary non-linear equation which describes the pressure field induced by an external pumping force and a time-dependent linear equation describing the advection-dispersion of the contaminant in a medium whose properties depend non-linearly on the pressure field.
Both models describe 2D systems, and each exhibits ideal characteristics to test the proposed algorithms.The first, while leading to a non-linear inverse problem, is sufficiently simple to allow for a comparison between the adjusted and biased RB-EnKM and the reference full order EnKM.Moreover, its affine dependence on the parameter enables the use of error bounds for the efficient construction of the reduced space.The second problem, which is non-linear and non-affine in the six-dimensional parameter vector, is complex enough to serve as a non-trivial challenge for the proposed RB-EnKM algorithm, while the reference EnKM cannot even be tested due to the computational cost.From an a priori estimate, performing full order tests with the same statistical relevance as the reduced basis ones would have taken up to 20 days on our machine.
The two problems are presented in Section 4.1 and Section 4.2.We first introduce the pPDE, then present the full order discretization followed by the reduced basis approximation.The measurement operator is then introduced, and a first analysis of the inversion method is carried out.Finally, we study the impact of the ensemble size, of the experimental noise magnitude, and of the error of the reduced model on the reconstruction error of the EnKM.All the computations are performed using Python on a computer with 2.20 GHz Intel Core i7-8750H processor and 32 GB of RAM.
The full order model is obtained by a nodal finite element discretization of Equation ( 21) using piecewise continuous polynomial functions, ζ i : Ω → R, i = 1, . . ., N h , of degree 2 over a uniform Cartesian grid of width h = 0.04, for a total of N h = 10, 100 degrees of freedom.The resulting system of ordinary differential equations is integrated over time using a Crank-Nicolson scheme with uniform time step ∆t = 0.01.This is equivalent to performing a Petrov-Galerkin projection of Equation ( 21) with trial and test spaces defined as follows: we consider the partition of the temporal interval I into the union of equispaced subintervals, I n := (t n−1 , t n ], of length ∆t with n = 1, . . ., N t and N t := 2.5/∆t.Let ω n : I → R be a piecewise constant function with support in I n , and let υ n : I → R be a hat function with support in I n ∪ I n+1 .We define the trial space and the test space , respectively.To solve the spatial problems arising at each time step, we use the sparse splu function implemented in the scipy.sparse.linalg 1 package.The computational time to obtain a single full order solution is on average 0.56s.Snapshots of the solution at times t ∈ {0.2, 0.8, 1.4, 2.0} and for the three parameter values µ ∈ {1/10, 1/30, 1/50} are shown in Figure 2. The high-fidelity model is used in combination with the time-gradient error bound ∆ pr R (µ) introduced in [41] to implement a Weak-POD-Greedy algorithm for the selection of the reduced basis functions.To this end, we consider the training set Ξ µ TRAIN with parameters µ (s) = 1/(9.5+ 0.5s) for all s ∈ N ∩ [1, S] of size S = 81.We prescribe a target accuracy of 10 −2 for the maximum time-gradient relative error bound and we obtain an RB space of size 42.We can construct surrogate models of different accuracy by selecting N ε ∈ N basis functions ψ i : Ω → R, for i = 1, . . ., N ε , out of these 42.Each choice 1 https://docs.scipy.org/doc/scipy/reference/sparse.linalg.htmlcorresponds to a relative error for the model given by Once the reduced basis has been computed, we construct a reduced model via a Petrov-Galerkin projection of Equation ( 21) in the same way as we did for the full order model.For this purpose, we define the trial space V ε := span{υ n • ψ i } Nε,N t i,n=1 and the test space W ε := span{ω n • ψ i } Nε,N t i,n=1 .We then look for a reduced solution of the form where the expansion coefficients c 0,i , for i = 1, . . ., N ε , result from the projection of the initial condition onto V ε , while the remaining coefficients c n,i , with i = 1, . . ., N ε and n = 1, . . ., N t , satisfy the equation Here the matrices M, K, A ∈ R Nε×Nε denote the mass, stiffness, and advection matrix, respectively, and are given by The solution of the system of equations ( 24), equivalent to a Crank-Nicolson scheme, can be obtained iteratively solving N t linear systems of size N ε for an online complexity O(N 3 ε + N t N 2 ε ).Employing all basis functions, the computational time for a reduced basis solution (online cost) is on average 5.4ms, significantly less than the approximately 0.56s required for a full order solution.The acceleration achieved is over 100, which justifies the 47s necessary for the construction of the RB model (offline cost), considering that the online phase requires computing up to 150 reduced basis solutions per iteration.Let us remark that such a cheap training phase is due to the lowdimensionality of the parameters space P and the availability of a tight error bound for this class of linear problems.
Note that both the online computational cost and the accuracy of the solution depend on ∆t and on N ε .The first is kept fixed, ∆t = 0.01, while the latter varies in some of the experiments.In order to keep track of the error associated with different choices of N ε , we proceed with the characterization of the error between the surrogate model solution c ε (µ) and the full model solution c h (µ) for different values of N ε .This analysis is provided in Figure 3  depicting the maximum relative errors in L 2 (I, H 1 (Ω)), L ∞ (I, L ∞ (Ω)) and the time-gradient norm versus the reduced basis size.It shows a nearly exponential error decay as N ε increases.The maxima are computed on an independent test set, Ξ µ TEST := {1/(9.75+ 0.5s), for all s ∈ N ∩ [1, 80]}.Furthermore, the reduced solutions do not appear to deviate significantly from the projection of their full order counterparts onto the associated RB space, and the error bound employed demonstrates a good effectivity.
For the implementation of the EnKM as presented in Subsection 2.1, it is necessary to provide a mathematical model for the measurement process.We take 40 measurements in time at the three sensor locations, η i , i ∈ {1, 2, 3}, shown in Figure 1.For this purpose, we introduce the measurement operator L : L 2 (I, L 2 (Ω)) → R 120 , which can be seen as a vector of linear functionals k : L 2 (I, L 2 (Ω)) → R for all k ∈ N ∩ [1,120].Each of those linear functionals has a unique Riesz representer ρ k : I ×Ω → R, with respect to the L 2 (I, L 2 (Ω)) norm, that can be written as where the spatial fields η i : Ω → R are Wendland functions ψ 2,1 of radius 0.1 and center coordinates (x i , y i ) ∈ {(0.1, 0.7), (−0.1, −0.5), (0.5, 0.1)} (see Figure 1), while, for each j ∈ N ∩ [1,40], ν j : I → R is a piecewise linear function supported over the interval I j := [t j − 2∆t, t j + 2∆t], where t j := ∆t(33 + 5j); ν j is assumed to be symmetric with respect to t j and constant between t j − ∆t and t j + ∆t.
Given this description of the observation process and the surrogate model, we next test the data assimilation scheme.We start with the estimation of the unknown parameter µ = 0.04 given the experimental measurements y(µ , η) ∈ R 120 , with noise η ∼ N (0, Σ).We compare the performances of the EnKM employing a full order model and a surrogate model of accuracy ε c = 10 −3 with N ε = 42.In order to obtain reliable statistics, we consider 25 ensembles E 0 of size J = 150 with particles sampled from the uniform prior distribution, Π 0 (µ) = U (0.02, 0.10).The results obtained for a fixed value of σ 2 = 10 −6 , at different iterations of the algorithm, are shown in Table 1.We observe a quick stabilization of the error means H h , H ε and H * ε , and of the error covariances, S h , S ε and S * ε , after just a few steps.The full order algorithm performs significantly better than the biased reduced basis algorithm, while the adjusted version of the algorithm exhibits an excellent performance, very close to the full order one.
We next investigate the sensitivity of the algorithm to the accuracy of the reduced model, to the effect of the ensemble size, and to the noise magnitude.First, we repeat the estimation of the reference parameter µ = 0.04 for different values of the ensemble size J = 4k, with k ∈ N ∩ [1,10].In this experiment, we employ the same surrogate model used before and consider the relative noise magnitude σ/ G(µ ) ∞ = 10 −3 .The results, shown in Figure 4, indicate a larger sensitivity to J for the full order algorithm than for the other two.It requires a larger number of particles before stabilizing on a large-ensemble asymptotic behavior (or mean-field behavior), while the reduced basis algorithms exhibit a much faster convergence, probably as a consequence of a lower-dimensional state space.Among the three iterations considered, the first appears to be the most affected, while, as the algorithm converges, the ensemble size seems to become less relevant.
In a second experiment, we consider the same parameter estimation, but we let the relative noise σ/ G(µ ) ∞ take values 10 −i for i ∈ N ∩ [2,6].Moreover, we employ J = 40 particles per ensemble and the same reduced basis model as before.Each estimation is replicated 64 times for different noise realizations.The results are shown in Figure 5: for the full order EnKM we observe a linear dependence between the reconstruction error and the experimental noise, while the results for the biased RB-EnKM show that an untreated model bias introduces a systematic error independent of the noise magnitude.The most important result is the one related to the adjusted RB-EnKM: the error behavior achieved with this algorithm is comparable with the one obtained using a full order model.This demonstrates the effectiveness of the proposed method in compensating for the bias introduced by the reduced basis model, at least in this case study.This conclusion is further confirmed by the last experiment, in which the performances of the biased and adjusted RB-EnKM are tested for all the parameters in the test set Ξ µ TEST already employed to test the reduced basis model.Each parameter in the set is estimated using surrogate models of increasing sizes.Each estimation is performed 64 times in very low-noise conditions, that is σ/ G(µ ) ∞ = 10 −5 , employing J = 40 particles per ensemble.For each surrogate model employed, the results from the 64 ensembles are averaged and the maximum over the test set is computed.The results, shown in Figure 6, demonstrate the ability of the correction to compensate for the presence of a model bias very well.As a consequence, the worst-case reconstruction error for the adjusted RB-EnMK barely depends on the reduced model size and it is always significantly lower than its biased counterpart.These results confirm the good performance of the adjusted RB-EnKM and its superiority over the biased RB-EnKM.

Tracer Transport Problem
We now consider the tracer transport problem from [42], describing the non-homogeneous and non-isotropic transport of a non-reactive tracer in an unconfined aquifer.We introduce the spatial domain Ω := (0, 1) 2 divided into six sub-regions Ω = 6 r=1 Ω r illustrated in Figure 7 and defined as follows: (x, y) ∈ Ω is in Ω r if the subscript r is the smallest integer for which x r 0 < x < x r 1 and y r 0 < y < y r 1 where the points {(x r 0 , y r 0 )} 6 r=1 and {(x r 1 , y r 1 )} 6 r=1 are defined in Table 2.We denote by ∂Ω the outer boundary of the domain and define the parallel walls Γ D := (0, 1) × {0, 1} and Γ N := ∂Ω \ Γ D .Based on this partition, we define the conductivity field as the piecewise constant function k( • ; µ) : Ω → R over the six sub-regions Ω r .The conductivity can be affinely decomposed employing the coefficient vector µ ∈ R 6 , with components µ r , and the indicator functions η r : We can now estimate the hydraulic log-conductivity µ, restricted to the orthotope D := 6 r=1 (µ min r , µ max r ) ⊂ R 6 , relying on measurements of the tracer concentration c( • , • ; µ).This field satisfies the pPDE: find c( In this equation, the dispersion coefficients d l = d m = 2.5 • 10 −3 correspond to the flow-dependent component of the dispersion tensor and to its residual component, respectively.The forcing term f c is assumed to be of the form f c := 4 i=1 f c,i and it models the injection of different amounts of tracer in four wells located at (a i , b i ) ∈ {0.15, 0.85} 2 ; each f c,i is a Gaussian function centered in (a i , b i ), with covariance Γ c = 0.005 and multiplicative coefficient p i where (p 1 , p 2 , p 3 , p 4 ) = (10, 5, 10, 5).The velocity field β( • , µ) : Ω → R 2 is linearly dependent on the hydraulic head u( • , µ) : Ω → R through the relation β = −k(µ)∇u.The latter field must satisfy the second constraint of the inverse problem, i.e., under the Dupuit-Forchheimer approximation [43] it solves the non-linear elliptic pPDE: find u( Here, the forcing term f u := 4 i=1 f u,i models an active pumping action at the four wells, each f u,i is a Gaussian function centered in (a i , b i ), of covariance Γ u = 0.02 and coefficient q i , where (q 1 , q 2 , q 3 , q 4 ) = (10, 50, 150, 50).Due to the combination of the quadratic dependence in u and the zero boundary conditions, the equation always admits pairs of opposite solutions u + , u − .However, in our study, we are only interested in the positive solution Full order solutions are obtained via a finite element approximation, employing piecewise linear functions, ζ i : Ω → R, for i = 1, . . ., N h , with N h = 44, 972 degrees of freedom (mesh size h ≈ 0.01).The discretization of the elliptic equation (28) results in a discrete non-linear problem which is iteratively solved employing a Newton scheme with tolerance 10 −6 .The approximate solution, u h , is used to compute the velocity field, β h := −k(µ)∇u h , which is piecewise constant with N h − 1 degrees of freedom.This is needed for the solution of the parabolic equation ( 27), whose discretization leads to a system of ordinary differential equations integrated over the time interval I = (0.0, 0.5] using the Crank-Nicolson scheme with uniform time step ∆t = 0.01.This is equivalent to performing a Petrov-Galerkin projection of Equation ( 21) with trial and test spaces defined as follows: we consider the partition of the temporal interval I into the union of equispaced subintervals I n := (t n−1 , t n ] of length ∆t with n = 1, . . ., N t and N t := 0.5/∆t.Let ω n : I → R be a piecewise

Hydraulic Head and Advection Field
Concentration and Sensor Locations  constant function with support in I n , and let υ n : I → R be a hat function with support in I n ∪ I n+1 .We define the trial space and the test space , respectively .Each full order simulation is obtained employing a FreeFEM++ solver [44] and takes roughly 2 minutes to be computed.Figure 8 The same reference log-conductivity is used as the true parameter for the data assimilation problem.Pointwise observations are collected at five successive times t m ∈ {0.1, 0.2, 0.3, 0.4, 0.5}, in 25 spatial location x ij = (x i , y j ) such that x i = 0.1+0.2iand y j = 0.1+0.2jfor i, j ∈ {0, . . ., 4}.This operation is encoded in the measurement operator L : H 1 (Ω) → R 125 .Each noise-free measurement is polluted with i.i.d.Gaussian noise with mean zero and covariance σ, resulting in a noise covariance matrix Σ = σ 2 I.
In order to solve the inverse problem with surrogate models of different accuracy, many approximations of equation ( 28) and ( 27) must be produced.This requires the introduction of spatial basis functions ψ i , ϕ selected by applying the method of snapshots (POD) to the two sets of full order solutions, ), for all s ∈ N ∩ [1, S] and sampling times t (z) = 0.01z for all z ∈ N ∩ [1, Z], where S = 2, 000 and Z = 50.The number of basis functions considered, N ε , M ε ∈ N, is the one required to approximate the hydraulic head and the tracer concentration with relative accuracy ε u , ε c ∈ R + , where Based on the first set of basis functions, the approximation space for the Galerkin projection of equation ( 28) is defined as U ε := span{ψ i } Nε i=1 .From the second set of basis functions, instead, the RB test space and RB trail space are defined for the Petrov-Galerkin projection of equation (27).We look at reduced solutions of the form Where the expansion coefficients c n,j and u i , with given the initial conditions c 0,j = 0 for all j ∈ N ∩ [1, M ε ].The scalar forcing terms f i , g j are obtained by integrating their full order counterparts versus the basis functions ψ i and ϕ j , for all i The mass and stiffness matrices M, K ∈ R Mε are defined as in (25), while the parameter dependent tensors For a fixed value of the log-conductivity, µ, the tensors N(µ) and D(u, µ) can be assembled.The latter, however, requires the evaluation of the discrete hydraulic head u.They are respectively defined as We emphasize that the accuracy of the solutions to Equations ( 34) and (35), with the latter equivalent to a Crank-Nicolson discretization, depends on the number of basis functions and on the time step ∆t.In Figure 9, on the left and on the right, we show the maximum relative errors of the surrogate model (ε u , ε c ) as a function of N ε and M ε .In the center, we show the L ∞ (I; L ∞ (Ω)) relative error of the tracer concentration, bounding from above the error on synthetic measurements.We compute these maximum relative errors on a set of parameters Ξ µ TEST := {µ (s) ∼ Π 0 (µ)} 500 s=1 independent of the ones used for the model training.It can be observed that, for small values of N ε , the error in the concentration stagnates after a certain value of M ε , suggesting that, in this region, the error is dominated by the approximation of the hydraulic head.However, for N ε = 40, this effect is no longer present, at least for the values of M ε considered, and the tracer error only depends on M ε .This allows us to modify the accuracy of the model by varying the dimension of the reduced model.
The construction of the reduced model has an offline cost of about 100 hours.This includes the time required for the construction of a training set of 2, 000 full order solutions (61h 16 40 ), the time for the computation of the POD basis functions ( 23  We now focus on the inverse problem, as discussed in Section 2.1.We start by considering the estimation of the reference parameter µ given the measurements y(µ , η) ∈ R 125 , polluted by experimental noise of magnitude σ.To have a reliable statistic, we consider 32 independent initial ensembles E 0 of variable size, sampled from the same distribution Π 0 .
As a first experiment, we compare the performances of the two RB-EnKM employing J = 160 particles and a surrogate model with error tolerance ε c ≈ 0.02 (obtained with N ε = 40 and M ε = 320).The first test relies on the biased version of the RB-EnKM, as presented in Section 2.1, while the second test corresponds to the adjusted algorithm.For both simulations, we consider lowamplitude experimental noise, i.e., negligible if compared to the model error, sup µ L(c h (µ) − c ε (µ)) ∞ ≈ 10 −2 > 10 −3 = σ, and we separately pollute the measurements employed by the ensembles.In Table 3, we report the average properties of the ensembles after 4 iterations: columns E ε and E * ε contain the mean parameter estimation, i.e., the particle mean, averaged over the 32 ensembles.Here, columns Σ ε and Σ * ε contain the average standard deviation of the ensembles.We can observe that the correction term has the effect of significantly lowering the reconstruction error from E ε − µ ∞ = 6.437e-3 to E * ε − µ ∞ = 7.870e-4.We also notice that the variability of the estimate increases consistently with the presence of an additional term in the Kalman gain.
As an extension of the previous experiment, we estimate the reference parameter µ employing the same surrogate model, noise magnitude and number of ensembles as before, but using ensembles of variable size J = 20k, with k ∈ N ∩ [2,16].This allows us to study the effect of the ensemble size on the parameter estimation obtained with the biased and adjusted RB-EnKM algorithms.The results shown in Figure 10 indicate that, for both algorithms, very  we estimate the reference parameter µ in low-noise conditions, σ = 10 −3 , averaging the results obtained over 16 ensembles of 160 particles each.In Figure 12, we show the final relative error (after three algorithm iterations) as M ε and ε c change, both for the biased and the adjusted RB-EnKM.For both, we observe that the relative estimation error decreases, almost linearly, with the error of the surrogate model.Moreover, we observe that, with few exceptions, the error of the adjusted algorithm is smaller than the error of the biased algorithm.The few points where the two errors are very close can be explained by a strongly unbalanced distribution of the measurement bias in a region away from the reference parameter.Future developments that take into account, in the execution of the algorithm, the parameter estimate to adjust the bias correction should dampen this effect.

Conclusions
We proposed an efficient, gradient-free iterative solution method for inverse problems that combines model order reduction techniques, via the reduced basis method, and the Kalman ensemble method introduced in [19].The use of surrogate models allows a significant speed-up of the computational cost, but it leads to a distortion in the cost function optimized by the inverse problem.This in turn introduces a systematic error in the approximate solution of the inverse problem.To overcome this limitation, we have proposed the adjusted RB-EnKM which corrects for this bias by systematically adjusting the cost function and thus retrieving good convergence.
Using a linear Taylor-Green vortex problem, the performance of the method is compared versus the full order model as well as to the biased RB-EnKM in which no adjustment was made.The numerical results show that the biased method fails to achieve the same accuracy of the full order method.Contrarily, the adjusted RB-EnKM attains the same accuracy as its full order counterpart for a large range of noise magnitudes at a significantly lower computational cost, and even approaches the mean-field limit faster as the ensemble size is increased.Furthermore, the dependence on model accuracy of the reconstruction error is essentially removed over the range of model accuracy considered.
The method was then applied to a non-linear tracer transport problem for which the full order model was impractical.The results for this example show that, despite a decrease in the order of convergence at low-noise, the stagnation of the reconstruction error observed in the biased RB-EnKM can be removed by adjusting the algorithm.Regarding the model accuracy, a substantial improvement of the adjusted EnKM with respect to the biased EnKM was observed, although less pronounced than in the linear problem.
Overall, our numerical tests show that the proposed method allows for the use of inexpensive surrogate models while empirically ensuring that the predicted result of the inversion remains accurate with respect to the full order inversion at a significantly lower computational cost.

Fig. 1 Fig. 2
Fig.1Spatial domain of the Taylor-Green problem.On the left: the initial condition c 0 in blue, the sensor shape functions η i , and the Neumann and Dirichlet boundaries, Γ N , Γ D .On the right, the velocity field, β, with four Taylor-Green vortices.

Fig. 3
Fig.3Left: maximum relative time-gradient error and error bound of the advectiondiffusion solution vs. Nε.Center: maximum L 2 (I, H 1 (Ω)) relative error of the projection and of the solution vs. Nε.Right: maximum L ∞ (I, L ∞ (Ω)) relative error of the projection and of the solution vs. Nε.Projections based on the L 2 (Ω) inner product of the gradients.

Fig. 4
Fig.4Relative error in the parameter estimation vs. ensemble size J for fixed noise magnitude, σ = 10 −3 G(µ ) ∞.The standard full order EnKM is shown on the left, the adjusted RB-EnKM in the center and the biased RB-EnKM on the right.The solid lines represent the average error over 64 ensembles, while the dashed lines correspond to the 10th and 90th percentiles.

Fig. 5
Fig.5Relative error in the parameter estimation vs. relative noise magnitude σ/ G(µ ) ∞ for fixed ensemble size J = 40.The standard full order EnKM is shown on the left, the adjusted RB-EnKM in the center, and the biased RB-EnKM on the right.The solid lines represent the average error over 64 ensembles, while the dashed lines correspond to the 10th and 90th percentiles.

Fig. 6
Fig. 6 Parameter error vs. reduced basis size for the biased and the adjusted RB-EnKM.

Fig. 7
Fig. 7 Domain of the tracer transport problem and injection wells.

Fig. 9
Fig.9Left and center: maximum relative error of the solution and of the projection of the tracer concentration vs. Mε for different values of Nε; projection -in space -performed with respect to the H 1 (Ω) inner product.Right: maximum relative error of the projection and of the solution of the hydraulic head vs. Nε.Error norm shown above each plot.

Fig. 11
Fig. 11 Biased and adjusted RB-EnKM parameter estimation relative error vs. absolute noise magnitude, for fixed ensemble size J = 160.The solid lines represent the average error over 32 ensembles at different algorithm iterations.The dashed lines represent the 10th and the 90th percentiles.

Fig. 12
Fig. 12 Parameter error vs. reduced basis size and maximum relative error of the solution.The solid lines represent the average error over 16 ensembles at different algorithm iterations.The dashed lines represent the 10th and the 90th percentiles.

Table 1
Comparison of reference FE (• h ) -biased RB ( •ε) -adjusted RB ( • * ε ) EnKM in low-noise conditions σ 2 = 10 −6 .The test was performed by averaging 25 ensembles of 150 particles each and using reduced basis models of size Nε = 42 (εc ≈ 0.001).H refers to the mean of the estimation error, while S denotes the standard deviation of the estimation error, and c.t. the computational time.

Table 2
On the left: coordinates of the corners of the sub-regions Ωr.On the right: true values of the parameters µr and boundaries of the uniform prior Π 0 satisfy the systems of algebraic equations

Table 3
Comparison of biased RB ( •ε) -adjusted RB ( • * ε ) EnKM in low-noise conditions σ = 10 −6 .The test was performed by averaging 32 ensembles, each employing 160 particles and 4 iterations, and using reduced basis models of size Nε = 40, Mε = 320 (εc ≈ 0.02).E refers to the average parameter estimation, while Σ denotes the average ensemble standard deviation, and c.t. the computational time.This cost corresponds roughly to the computational cost of 2, 500 finite element solutions, each of which takes approximately 110s.By contrary, the surrogate model obtained employing N ε = 40, M ε = 320 basis functions produces a solution in only 1.25s (online cost), which is about 1/90 of its full order equivalent.The same training set used for the POD is employed to estimate, at negligible cost, the empirical moments of δ ε , i.e., δ ε and Γ ε .