1 Introduction

Electrical resistivity tomography (ERT) is one of the most widely used geophysical methods that can successfully be employed for geotechnical characterization, monitoring earthen dams and embankments, landfill monitoring, groundwater exploration, and mapping of contaminant plumes (e.g., Aleardi et al., 2020; Arosio et al., 2017; Bièvre et al., 2018; Chambers et al., 2006; Dahlin, 2020; Hermans & Paepen, 2020; Hojat et al., 2019a; Karimi-Nasab et al., 2011; Loke et al., 2020; Moradipour et al., 2016; Müller et al., 2010; Pollock & Cirpka, 2012; Supper et al., 2014; Tresoldi et al., 2019; Whiteley et al., 2017). The ERT inversion is nonlinear and ill-posed, thus meaning that the solution to this problem is nonunique due to the limited resolution and noise contamination in the acquired data. The methods used to solve the ERT problem can be classified into two categories: deterministic and probabilistic algorithms. The former are gradient-based approaches that minimize an error function through an iterative linearization procedure (Karoulis et al., 2014; Pidlisecky & Knight, 2008; Zhang et al., 2005). To stabilize the problem, the error function usually includes both a data misfit term and some type of model regularizations, such as smoothness constraints or closeness of the solution to a predefined resistivity model. Undoubtedly, these approaches are the most popular methods to solve the ERT inversion because they guarantee rapid convergence toward a best-fitting model. However, the loss of information produced by the local linearization hinders accurate uncertainty appraisals. Furthermore, these local strategies can often get trapped in local minima of the error function.

On the contrary, probabilistic methods express the inverse solution through the posterior probability density (PPD) function in the model space that fully quantifies the uncertainty on the estimated parameters (Tarantola, 2005). The main advantage of probabilistic over deterministic approaches lies in their ability to provide a set of model realizations that reproduce the measured data. The PPD can be expressed in a closed-form only for linear forward operators and Gaussian assumptions about data and model parameter distributions. For nonlinear problems and/or non-Gaussian assumptions, a complete characterization of the PPD is only possible through sampling, and in these contexts, Markov Chain Monte Carlo (MCMC; Sambridge & Mosegaard, 2002; Sen & Stoffa, 2013) algorithms can be used to numerically estimate the target posterior. These methods provide accurate uncertainty assessments but require a considerable computational effort to converge toward stable PPD estimations, especially in large-dimensional model spaces and for expensive forward model evaluations (Aleardi & Salusti, 2020; Aleardi et al., 2017, 2020; Pradhan & Mukerji, 2020; Sajeva et al., 2014).

To alleviate the computational workload for uncertainty assessment in nonlinear inverse problems, an approximate Bayesian method, the ensemble-based inversion can be applied. This approach is a data assimilation algorithm in which the PPD is represented by an ensemble of model realizations obtained by assimilating the observed data. With an underlying Gaussian assumption on model parameters and noise distribution, this method provides an approximate description of the posterior model, which is usually accurate if the model parameters are approximately Gaussian, the forward operator is not highly nonlinear, and the number of models forming the ensemble is large enough. For example, an initial ensemble with a limited number of models might lead to an underestimation of the posterior uncertainty. The main advantage of this approach is the reduced number of forward modeling runs required to get reasonable PPD estimations with respect to MCMC algorithms. Popular ensemble-based methods are the ensemble smoother (ES; Evensen, 1994), the ensemble Kalman filter (EnKF; Evensen, 2009), and the ensemble smoother with multiple data assimilation (ES-MDA; Emerick & Reynolds, 2013). Some applications of these approaches to solve geophysical inverse problems can be found in Jin et al. (2008), Tveit et al. (2015), Liu and Grana (2018a, 2018b, 2018c), Thurin et al. (2019), Tveit et al. (2020).

In this work, we propose an ensemble-based inversion of ERT data (EB-ERT) that makes use of the ES-MDA algorithm. Differently from the ES, in ES-MDA the observed data are assimilated multiple times with an inflated data covariance matrix. In this context, each data-assimilation step corresponds to a Bayesian updating step in which the conditional distribution of the model parameters is approximated by estimating the conditional mean and covariance matrix from the ensemble of models. In our application, a finite element (FE) code (Karaoulis et al., 2013) constitutes the forward operator that computes the observed data from the subsurface resistivity values.

To reduce the ill-conditioning and the number of unknown of inversion problems, several compression strategies could be adopted such as wavelet transform (Li & Oldenburg, 2003), singular value decomposition (Liu & Grana, 2018b), Legendre polynomials (Aleardi, 2019), convolutional autoencoder (Liu & Grana, 2018a). Here we reduce both the ill-conditioning of the ERT problem and the number of ensemble members needed for reliable uncertainty through a discrete cosine transform (DCT) reparameterization that is a very common method extensively used for image compression (Britanak et al., 2010). The DCT expands a signal (e.g., expressing the subsurface resistivity model) into a series of cosine functions oscillating at different frequencies. The low-order DCT coefficients express most of the variability of the original signal, and the model compression is simply accomplished by zeroing the numerical coefficients of the basis functions beyond a certain threshold. However, each compression technique must be applied taking in mind that part of the information in the original (unreduced) parameter space could be lost in the reduced space and thus, the model parameterization must always constitute a compromise between model resolution and model uncertainty (Aleardi, 2020; Aleardi & Salusti, 2021; Dejtrakulwong et al., 2012; Fernández‐Martínez et al., 2017; Grana et al., 2019; Lochbühler et al., 2014). Similarly, data parameterization must constitute a compromise between good data resolution (our ability to match the observed data) and accurate uncertainty quantification. It should be also noted that the order of the retained non-zero DCT coefficients in the model space determines the wavelength of the recovered solution along different directions (i.e., vertical, horizontal). Therefore, in our application, the DCT also acts as a regularization operator in the model space that guarantees the preservation of the spatial continuity of the resistivity values in the recovered solution.

The EB-ERT is first assessed on synthetic data and then applied to field data. In both cases, a geostatistical method generates the initial ensemble of models according to a stationary log-Gaussian prior model and a Gaussian variogram. We also compare the results of the proposed approach with those achieved by an MCMC sampling and with a deterministic least-squares inversion. More specifically, we sample the PPD in the reduced DCT domain by employing the Differential Evolution Markov Chain (DEMC; Vrugt, 2016), an MCMC method that improves over the standard random walk Metropolis because it exploits multiple and interactive chains to sample the parameter space. We will demonstrate that if the number of ensemble members is large enough, the EB-ERT inversion provides reasonable estimations of uncertainties comparable to the MCMC algorithm but with a dramatic reduction of the computational workload. We also assess the influence of the number of ensemble members on the quality of the EB-ERT inversion results. As far as the authors are aware, this is the first time that the EB and MCMC methods are compared in the context of ERT inversion.

2 Methodology

2.1 The Bayesian Approach and the Ensemble-Based Inversion

Gradient-based deterministic inversions are aimed at minimizing a misfit function, which usually is a linear combination of data error and a model regularization term. For Gaussian-distributed noise and model parameters, the error function can be written as follows:

$$E\left(\mathbf{m}\right)={\left|\left|{\mathbf{C}}_{d}^{-\frac{1}{2}}\left(\mathbf{d}-G\left(\mathbf{m}\right)\right)\right|\right|}_{2}^{2}+{\left|\left|{\mathbf{C}}_{m}^{-\frac{1}{2}}\left(\mathbf{m}-{\mathbf{m}}_{prior}\right)\right|\right|}_{2}^{2},$$
(1)

where the vectors \(\mathbf{m}\) and \(\mathbf{d}\) identify the model parameters and the observed data, respectively; \({\mathbf{C}}_{d}\) and \({\mathbf{C}}_{m}\) are the data and prior model covariance matrices; \({\mathbf{m}}_{prior}\) is the prior model vector, and G is the forward modeling operator that maps the model into the corresponding data. The minimum of \(E\left(\mathbf{m}\right)\) can be iteratively approached through a local quadratic approximation of the error function around a starting model (Aster et al., 2018; Menke, 2018).

Differently, a Bayesian inversion is aimed at estimating the full posterior distribution in the model space expressed by:

$$p\left( {{\mathbf{m}}{|}{\mathbf{d}}} \right) = \frac{{p({\mathbf{d}}|{\mathbf{m}})p\left( {\mathbf{m}} \right)}}{{p\left( {\mathbf{d}} \right)}},$$
(2)

where \(p\left(\mathbf{m}|\mathbf{d}\right)\) is the posterior probability density (PPD), \(p(\mathbf{d}|\mathbf{m})\) is the data likelihood function, whereas \(p(\mathbf{m})\) and \(p(\mathbf{d})\) are the a-priori distributions of model parameters and data, respectively. Under Gaussian assumptions for data and model parameter distributions, and for a linear forward operator G, we can write (Tarantola, 2005):

$$p\left( {\mathbf{m}} \right) \propto {\text{exp}}\left( { - \frac{1}{2}\left( {{\mathbf{m}} - {\mathbf{m}}_{prior} } \right)^{T} {\mathbf{C}}_{{\mathbf{m}}}^{ - 1} \left( {{\mathbf{m}} - {\mathbf{m}}_{prior} } \right)} \right),$$
(3)
$$p({\mathbf{d}}|{\mathbf{m}}) \propto {\text{exp}}\left( { - \frac{1}{2}\left( {{\mathbf{d}} - {\mathbf{Gm}}} \right)^{T} {\mathbf{C}}_{{\mathbf{d}}}^{ - 1} \left( {{\mathbf{d}} - {\mathbf{Gm}}} \right)} \right),$$
(4)
$$\begin{aligned} p\left( {{\mathbf{m}}{|}{\mathbf{d}}} \right) & \propto {\text{exp}}\left\{ { - \left[ {\frac{1}{2}\left( {{\mathbf{m}} - {\mathbf{m}}_{prior} } \right)^{T} {\mathbf{C}}_{{\mathbf{m}}}^{ - 1} \left( {{\mathbf{m}} - {\mathbf{m}}_{prior} } \right) + \frac{1}{2}\left( {{\mathbf{d}} - {\mathbf{Gm}}} \right)^{T} {\mathbf{C}}_{{\mathbf{d}}}^{ - 1} \left( {{\mathbf{d}} - {\mathbf{Gm}}} \right)} \right]} \right\} \\ & \propto {\text{exp}}\left( { - E\left( {\mathbf{m}} \right)} \right)\user2{ } \\ \end{aligned}$$
(5)

Under the previous assumptions, the maximum a posteriori (MAP) solution of the Bayesian approach coincides with the minimizer of E(m) and can be computed as:

$${\mathbf{m}}_{\mathbf{M}\mathbf{A}\mathbf{P}}=\mathrm{arg}\underset{\mathrm{m}}{\mathrm{min}}E\left(\mathbf{m}\right)={\mathbf{m}}_{prior}+\mathbf{K}\left(\mathbf{d}-{\mathbf{G}\mathbf{m}}_{prior}\right),$$
(6)

where the matrix \(\mathbf{K}\) is the so-called Kalman gain given by:

$${\mathbf{m}}_{{{\mathbf{MAP}}}} = {\mathbf{C}}_{{\mathbf{m}}}^{\user2{ }} {\mathbf{G}}^{T} \left( {{\mathbf{GC}}_{{\mathbf{m}}}^{\user2{ }} {\mathbf{G}} + {\mathbf{C}}_{{\mathbf{d}}}^{ - 1} } \right),$$
(7)

whereas the posterior covariance matrix can be computed as follows:

$${\mathbf{C}}_{{{\mathbf{m}}|{\mathbf{d}}}}^{\user2{ }} = {\mathbf{C}}_{{\mathbf{m}}}^{\user2{ }} - {\mathbf{KGC}}_{{\mathbf{m}}} .$$
(8)

However, for nonlinear inverse problems, the forward operator cannot be expressed into a matrix form and thus, the posterior covariance cannot be analytically computed. Moreover, for large dimensional model spaces, the computation of the Kalman filter becomes computationally unfeasible. To overcome these issues, EB methods replace the analytical model covariance with an empirical estimate of the covariance inferred from the ensemble members. For example, the ES simultaneously assimilates the observed data to globally update an initial ensemble of models generated according to the prior:

$${\mathbf{m}}_{k}^{u}={\mathbf{m}}_{k}^{p}+\tilde{\mathbf{K}}\left({\tilde{\mathbf{d}}}_{k}-{\mathbf{d}}_{k}^{p}\right),$$
(9)

with k = 1,…,N, where N represents the number of models in the ensemble and \({\tilde{\mathbf{d}}}_{k}\) is a random perturbation of the observed data according to the Gaussian distribution \(\mathcal{N}(\mathbf{d},{\mathbf{C}}_{\mathbf{d}})\). The subscripts u and p represent the updated (current iteration) and prior (previous iteration) variables, respectively; \({\mathbf{d}}_{k}^{p}\) is the data associated with the kth prior model \({\mathbf{m}}_{k}^{p}\). Finally, \(\tilde{\mathbf{K}}\) is the Kalman gain that can be estimated from the ensemble of models according to:

$$\tilde{\mathbf{K}}={\mathbf{C}}_{\mathbf{m}\mathbf{d}}^{p}{({\mathbf{C}}_{\mathbf{d}\mathbf{d}}^{p}+{\mathbf{C}}_{\mathbf{d}})}^{-1},$$
(10)

where \({\mathbf{C}}_{\mathbf{d}\mathbf{d}}^{p}\) denotes the covariance matrix of the predicted data; \({\mathbf{C}}_{\mathbf{m}\mathbf{d}}^{p}\) is the cross-covariance between the model vector \({\mathbf{m}}^{p}\) and the vector of the associated data \({\mathbf{d}}^{p}\). It can be demonstrated that the EnKF and ES are equivalent to a single Gauss–Newton iteration. This means that an iterative updating of the prior models is equivalent to performing multiple smaller corrections to the ensemble. The ES usually requires many iterations to achieve good data predictions. For this reason, the ES-MDA was proposed in which the data are assimilated multiple times with an inflated data covariance. In this context, Eq. (9) can be interpreted as a Bayesian updating step under the assumption of Gaussian distributed model parameters. Note that this equation can be also applied to nonlinear problems because the covariance matrices \({\mathbf{C}}_{\mathbf{d}\mathbf{d}}^{p}\) and \({\mathbf{C}}_{\mathbf{m}\mathbf{d}}^{p}\) are approximately inferred from the ensemble models and the associated data. The ES-MDA improves over the ES because it reduces the model variations at each iteration by employing a damping parameter α that scales the data covariance matrix. The steps forming the ES-MDA can be described as follows:

  1. 1.

    Define the total number of iterations Q, the number of models in the ensemble N, and the inflation coefficient α for each iteration with \({\sum }_{i=1}^{Q}\frac{1}{{\alpha }_{i}}=1\);

  2. 2.

    Generate the prior ensemble of models according to the Gaussian prior;

  3. 3.

    For each ith iteration

    1. (a)

      Run the forward models to compute the data associated with each ensemble member \({\left\{{\mathbf{d}}^{p}\right\} }_{1,\dots ,N}\);

    2. (b)

      Perturb the observed data for each kth ensemble member according to: \({\tilde{\mathbf{d}}}_{k}=\mathbf{d}+\sqrt{{\alpha }_{i}}{\mathbf{C}}_{\mathbf{d}}^{-1/2}\mathbf{n}\), with \(\mathbf{n}=\mathcal{N}(0,\mathbf{I})\), where I is the identity matrix, whereas \(\alpha_i {\mathbf{C}}_{\mathbf{d}}\) denotes the inflated covariance matrix;

    3. (c)

      Update the ensemble using Eqs. (9) and (10) in which \({\mathbf{C}}_{\mathbf{d}}\) is replaced by \({\alpha }_{i}{\mathbf{C}}_{\mathbf{d}}\).

The algorithm updates each model to minimize the L2 norm difference between the observed data and the data computed on the ensemble members. All the models that attain the final misfit values can be considered a sample from the equivalence region of solution in the model space (Aster et al., 2018). In other terms, such models equally reproduce the observed data and for this reason, represent possible subsurface scenarios in agreement with the acquired geophysical data and with the prior assumptions. From this ensemble of models, we can numerically derive an estimate of the posterior uncertainty.

The number of ensemble members should be large enough to get an accurate estimate of the \({\mathbf{C}}_{\mathbf{d}\mathbf{d}}^{p}\) and \({\mathbf{C}}_{\mathbf{m}\mathbf{d}}^{p}\) matrices but small enough not to make the forward evaluations computationally impractical. The optimal number of members also depends on the model space dimension: large model spaces require a considerable number of ensemble members to achieve reliable uncertainties estimations. Indeed, it is known that too small ensembles can result in an underestimation of the model uncertainties. To overcome these issues and to also mitigate the ill-conditioning of the problem we compress both data and model spaces using the DCT.

2.2 Discrete Cosine Transform

Several variants of DCT exist with slightly modified definitions, but in this work, we use the so-called DCT-2 formulation that is the most common. Hereafter we simply refer to the DCT-2 transformation as the DCT. We employ the DCT parameterization because it exhibits superior compression power over other compression methods (Lochbühler et al., 2014).

The DCT can be applied to multidimensional signals (i.e., 2-D matrices) and such multi-dimensional DCT transform follows straightforwardly from the one-dimensional definition because it is simply a separable product (equivalently, a composition) of DCTs along each dimension. For example, if we assume a 2-D resistivity model \({\varvec{\uprho}}\)(x,y) in which x = [0,1,…,\({M}_{x}\) − 1] and y = [0,1,…,\({M}_{y}\) − 1] represent the horizontal and vertical coordinates, respectively, the associated 2-D transform is defined as follows:

$$\left\{ {\begin{array}{*{20}l} {{\mathbf{R}}\left( {k_{x} ,k_{y} } \right) = \frac{1}{{\sqrt {M_{x} } }}\frac{1}{{\sqrt {M_{y} } }}\mathop \sum \limits_{x = 0}^{{M_{x} - 1}} \mathop \sum \limits_{y = 0}^{{M_{y} - 1}} {{\varvec{\uprho}}}\left( {x,y} \right),} \hfill & {if\;k_{x} = k_{y} = 0} \hfill \\ {{\mathbf{R}}\left( {k_{x} ,k_{y} } \right) = \sqrt {\frac{2}{{M_{x} }}} \sqrt {\frac{2}{{M_{y} }}} \mathop \sum \limits_{x = 0}^{{M_{x} - 1}} \mathop \sum \limits_{y = 0}^{{M_{y} - 1}} {{\varvec{\uprho}}}\left( {x,y} \right)cos\left( {\frac{{\left( {2x + 1} \right)\pi k_{x} }}{{2M_{x} }}} \right)cos\left( {\frac{{\left( {2y + 1} \right)\pi k_{y} }}{{2M_{y} }}} \right),} \hfill & {if\;k_{x} ,k_{y} \ne 0} \hfill \\ \end{array} } \right.,$$
(11)

where \(\mathbf{R}\left({k}_{x},{k}_{y}\right)\) represents the \({k}_{x}\)-th and \({k}_{y}\)-th DCT coefficient. The values within the matrix R represent the unknowns to be estimated in a DCT-reparametrized inverse problem. Equation (11) can be compactly rearranged in matrix form:

$$\mathbf{R}={\mathbf{B}}_{y}{\varvec{\uprho}}{\mathbf{B}}_{x}^{T},$$
(12)

where \({\mathbf{B}}_{x}\) and \({\mathbf{B}}_{y}\) are the matrices with dimensions \({M}_{x}\times {M}_{x}\) and \({M}_{y}\times {M}_{y}\), respectively that contain the DCT basis functions, whereas the \({M}_{y}\times {M}_{x}\) matrix R expresses the DCT coefficients. This approach concentrates most of the information of the original signal into the low-order DCT-coefficients, and hence, an approximation of the subsurface resistivity model can be obtained as follows:

$$\tilde{{\varvec{\uprho}}}={\left({\mathbf{B}}_{y}^{q}\right)}^{T}{\mathbf{R}}_{qp}{\mathbf{B}}_{x}^{p},$$
(13)

where \(\tilde{{\varvec{\uprho}}}\) is the approximated [\({M}_{y}\times {M}_{x}\)] resistivity model, \({\mathbf{B}}_{y}^{q}\) is a [\(q{\times M}_{y}\)] matrix containing only the first \(q\) rows of \({\mathbf{B}}_{y}\); \({\mathbf{B}}_{x}^{p}\) is a [\(p{\times M}_{x}\)] matrix containing only the first \(p\) rows of \({\mathbf{B}}_{x}\), whereas the matrix \({\mathbf{R}}_{qp}\) represents the first q rows and p columns of R. In other words, the scalar q and p represent the retained number of basis functions along the y and x directions used to derive the approximated model. Therefore, the DCT transformation allows a reduction of the (\({M}_{y}{\times M}_{x}\))-D full resistivity model space to a (\(q\times p\))-D DCT-compressed parameter space with \(p<{M}_{x}\) and \(q<{M}_{y}\). In this context the \(p\times q\) non-zero numerical coefficients of the \({\mathbf{R}}_{qp}\) matrix become the unknowns to be estimated after a DCT compression of the model space. Estimating the retained DCT-coefficients reduces the parameter space dimensionality and can significantly improve the computational efficiency of the inversion procedure. Figure 1 shows some DCT basis functions of different orders in a 2-D space and illustrates that the variability of the solution along each dimension is directly determined by the orders of the retained DCT coefficients. Note that moving from the full data and model spaces to the DCT-compressed domains reduces both the numbers of ensemble members needed for reliable uncertainty estimation and the dimensions of the \({\mathbf{C}}_{\mathbf{d}\mathbf{d}}^{p}\) and \({\mathbf{C}}_{\mathbf{m}\mathbf{d}}^{p}\) matrices, thereby reducing the computational effort of the entire inversion procedure.

Fig. 1
figure 1

2-D DCT basis functions of different orders. Blue and red code low and high numerical values, respectively

In this work, we apply the 1D DCT to compress the measured apparent resistivity values, and the 2D DCT to compress the subsurface resistivity model. This means that the model parameter vector m includes the retained number of coefficients in the model space, while the observed data vector d includes the retained coefficients in the data space. Figure 2 shows a schematic representation of the projection of the full model and data spaces onto the compressed DCT domain and the so derived model and data vectors for the EB-ERT inversion. Note that because of its trapezoidal shape, the apparent resistivity pseudosection cannot be expressed with a 2D matrix and thus, it has been flattened to a 1D vector before the DCT projection. After generating the initial ensemble according to the prior assumptions and computing the associated predicted data, all the models and data are projected onto the DCT space. We will consider log-Gaussian distributed resistivity values and for this reason, the DCT coefficients to be estimated refer to the natural logarithm of the resistivity. The inversion runs in the reduced domain and after the selected number of iteration has been reached, the ensemble of DCT models collected at the last iteration are projected back onto the resistivity space to compute the mean model and the associated uncertainty in the original, uncompressed domain.

Fig. 2
figure 2

Derivation of the model (a) and data (b) vectors in the DCT space from the resistivity model and the associated pseudosection

3 Synthetic Inversion

In the following synthetic example, the study area is 35 m long and 5.5 m deep and is discretized with rectangular cells with spatial dimensions of 1 m and 0.5 m along the horizontal and vertical directions, respectively. We define a stationary log-Gaussian prior model and a spatial Gaussian variogram that can be expressed by the following correlation function:

$${\uptau }_{x}=\mathrm{exp}\left(-\frac{{h}_{x}^{2}}{{\alpha }_{x}^{2}}\right),$$
(14)

where \({\uptau }_{x}\) indicates the correlation along the horizontal x-direction, \({h}_{x}\) is the spatial distance of the autocorrelation function along the same direction, and \({\alpha }_{x}\) is the parameter that defines the spatial dependency along the x-axis. Note that the scalar value \({\alpha }_{x}\) directly influences the variogram range along the x-direction. In this example we consider \({\alpha }_{x}\) and \({\alpha }_{y}\) equal to 4 and 2 m, respectively. The marginal log-Gaussian prior together with the lateral correlation functions are represented in Fig. 3. The true model is simply generated as a random realization extracted from the prior assumptions. This means that we put ourselves in the most favorable case in which the prior assumptions correctly capture the statistical properties and spatial variability patterns of the subsurface resistivity values. The true model includes two high-resistivity anomalies embedded in a low resistivity medium: one wider anomaly is located at the center of the model, while the other is located around the horizontal coordinate of 7.5 m. At the right edge of the model (between the horizontal coordinates of 25 and 30 m) we have also simulated a shallow and low-resistivity body.

Fig. 3
figure 3

a Marginal log-Gaussian prior distribution for the resistivity values. b, c represent the spatial correlation functions associated with the assumed 2-D variogram model along the lateral and vertical directions, respectively. d The true model for the synthetic inversion

We simulate a Wenner acquisition layout with 36 electrodes evenly spaced of 1 m and an injected current of 1 A. The maximum a value is 11. This configuration results in \(11\times 35=385\) model parameters to be estimated from \(198\) data points. The previously mentioned FE code has been used to compute the noise-free observed data that have been contaminated with uncorrelated Gaussian noise with a standard deviation equal to the 20% of the total standard deviation of the noise-free dataset. This results in a signal-to-noise ratio of 20 dB.

After defining the prior we must determine the optimal number of DCT coefficients needed to approximate the subsurface resistivity model. To this end, we quantify how the explained variability of the resistivity model changes as the number of DCT basis functions increases. The selection of the optimal number of DCT coefficients is a very delicate step that must guarantee uncertainty estimations, model resolution, and data fitting comparable to those achieved by an inversion running in the full, uncompressed space. A detailed discussion on how the model and data compressions affect the inversion results is far beyond the scope of this work and we refer the interested reader to Grana et al. (2019) for more theoretical insights. Figure 4a shows the explained variability for a model drawn from the prior as the number of retained DCT coefficients increases. We note that only 40 coefficients (i.e., 10 and 4 coefficients along the first and second DCT directions, respectively) explain almost the total variability of the uncompressed resistivity model. This means that the DCT allows for a reduction of the \(385\)D model space to \(4\times 10=40\)D domain. A similar analysis has been carried out on some datasets derived from prior realizations. An example is shown in Fig. 4b where it emerges that 80 coefficients successfully compress almost the total variability of the uncompressed dataset. Therefore, in this case, the full \(385\)D model space has been reduced to a 40D domain, while the full data space has been reduced to an \(80\mathrm{D}\) domain. As an example, Fig. 5 compares the true resistivity model with the approximated models derived for different numbers of retained DCT coefficients. If only two coefficients are considered along the two DCT dimensions, the actual resistivity values are not recovered. Four coefficients along the two DCT dimensions provide just a very smoothed version of the original model, while only 10 and 4 coefficients guarantee an almost complete reproduction of the lateral and vertical variations of the actual resistivity values. Therefore, the reduced spatial resolution that we will observe in the synthetic results is only related to the acquisition layout we simulate and not to the number of DCT coefficients employed.

Fig. 4
figure 4

a Examples of explained model variability as the number of DCT coefficients along the 1st and 2nd DCT dimension increases. We illustrate the explained variability for a resistivity model drawn from the prior distribution. The numerical value with coordinate (x, y) indicates the explained variability if the first x, and y DCT coefficients along the 1st and 2nd DCT dimensions, respectively, are used for compressing the resistivity model. It emerges that 10 DCT coefficients along the 1st dimension and 4 along the and 2nd dimension explain almost the 100% of the variability of the uncompressed resistivity model. b Explained variability as the number of DCT coefficients increases for the data associated with the model shown in a. In this case, 80 DCT coefficients express almost the total variability of the uncompressed dataset

Fig. 5
figure 5

A comparison between the true resistivity model drawn from the prior distribution and some approximations obtained with different numbers of DCT coefficients. The parentheses express the number of coefficients retained along the first and second DCT dimensions. For example [7, 4] indicates that we consider 7 and 4 coefficients along the first and second DCT directions, respectively

In the following synthetic experiments, we perform 5 tests employing different numbers of ensemble members with the aim to assess how the dimension of the ensemble affects the recovered model and the associated uncertainty. In all cases, the ensemble is evolved for 5 iterations. The simple kriging algorithm is used to generate the initial ensemble according to the assumed variogram model and the prior Gaussian distribution in the logarithm domain. We implement Matlab codes in which the generation of the initial ensemble and the forward evaluations are distributed across different cores to reduce the computational workload. The inversions are run on two deca-core intel E5-2630 at 2.2 GHz (128 Gb RAM). We measure the accuracy of the results using the linear correlation coefficient and root-mean-square error (RMSE) between the true model and the mean estimated model, and between the observed data and the data predicted on the mean model. We also assess the estimated posterior quantification by computing the coverage ratio for the 80% confidence interval (Table 1). We also evaluate the computing time needed to achieve a stable data misfit value.

Table 1 Quantitative summary of the EB inversion results

The estimated mean ensemble models and the associated standard deviations are represented in Fig. 6. We observe that in all cases we get similar estimates for the mean model: the wide anomaly in the central part of the investigated area is well recovered, as well as the low resistivity body located in the shallowest part and between the horizontal coordinates of 25–30 m. The actual resistivity value associated with the deeper and smaller anomaly is always underestimated. In all cases, as expected, the uncertainty increases moving from the shallow to the deep and lateral parts of the model. The uncertainty also increases as the resistivity value increases. However, for 100 and 200 models in the ensemble, we obtain an underestimation of the uncertainties compared with the other tests. This underestimation increases as the data illumination decreases and for this reason, it is more significant at the bottom and at the lateral edges of the study area. Differently, the estimated uncertainty is similar for the tests with 500, 1000, and 2000 models. Table 1 indicates that all the tests provide a mean model that closely reproduces the observed data. The correlation coefficients and the RMSE in the model space indicate that the accuracy of the results does not significantly change for ensembles with 500 or more models. Similarly, the coverage ratio shows that the precision of the results does not change in the last three tests with 500, 1000, and 2000 models. The computational cost per iteration is 29 s when 100 models are considered for the ensemble and it is increased up to 480 s when we consider 2000 models. Figure 7 represents the evolution of the L2 norm data misfit values for different tests and over 5 iterations. In all cases, the same final error value is reached in just 3 iterations. This means that the total computing time goes from 90 s for an ensemble of 100 models up to 1440 s (24 min) for 2000 ensemble members. However, we also note that larger ensembles are associated with a faster convergence rate in the very first iterations because they more accurate approximate the \({\mathbf{C}}_{\mathbf{d}\mathbf{d}}^{p}\) and \({\mathbf{C}}_{\mathbf{m}\mathbf{d}}^{p}\) matrices. These results indicate that for this test an ensemble of 500 models constitutes the best compromise between the accuracy and precision of the results and the computational cost of the EB-ERT inversion (i.e., 6 min to attain convergence). Figures 8 and 9 compare some resistivity sections associated with the test employing 500 ensemble members: Fig. 8 represents some models sampled from the prior, whereas Fig. 9 shows some resistivity sections forming the ensemble at the third iteration. In the latter case, we note that the resistivity anomalies are univocally recovered by all the models.

Fig. 6
figure 6

Estimated mean models (left) and associated standard deviation maps (right) for the different tests. a 100 models in the ensemble. b 200 models in the ensemble. c 500 models in the ensemble. d 1000 models in the ensemble. e 2000 models in the ensemble

Fig. 7
figure 7

Evolution of the L2 norm data misfit for the different tests

Fig. 8
figure 8

Some examples of resistivity sections drawn from the prior distribution and associated with the test employing 500 models in the ensemble

Fig. 9
figure 9

Some examples of models forming the ensemble at the third iteration for the test with 500 models

We now compare the outcomes of the EB-ERT inversion with those provided by an MCMC sampling running in the compressed space and with a standard, least-squares deterministic inversion running in the full data and model spaces. We implement a DEMC algorithm running for 20,000 iterations and employing 20 chains. The prior is the same used for the EB inversion. The deterministic inversion provides a final prediction very similar to the posterior model estimated by the MCMC algorithm (Fig. 10a, b). Both these predicted models and the standard deviation estimated by the MCMC inversion (Fig. 10a, b) are similar to those provided by the EB-ERT approach with more than 500 members forming the ensemble, although we observe a slight increase of the posterior uncertainties where the illumination is poor (e.g., at the bottom of the model; Fig. 10c) with respect to the EB results (Fig. 6c). The evolution of the negative log-likelihood illustrates that the stationary regime for the MCMC algorithm is attained in 3000 iterations, approximately (Fig. 10d). To quantitatively assess the convergence of the sampling we use the potential scale reduction factor (PSRF), a popular MCMC convergence diagnostic tool proposed by Brooks and Gelman (1998) to which we refer the reader for its formal definition. PSRF compares within-chain variances to the variance computed from all mixed chains for a given parameter. In practice, one can consider that the convergence to a stable posterior model has been achieved if the PSRF is lower than 1.2. In Fig. 10e we observe that more than 17,000 iterations are needed to achieve convergence for all the 40 model parameters. This translates into a total computational cost for the MCMC inversion of more than three days, approximately, using a parallel Matlab code running on the previously described hardware configuration. Figure 11 shows the comparison between the observed data and the data generated on the deterministic results and on the mean models estimated by the EB and MCMC inversion. In this synthetic example, all the approaches provide final predictions that closely reproduce the observed pseudosection.

Fig. 10
figure 10

a Predicted model provided by the deteminitsic inversion. b Posterior mean model estimated by the MCMC inversion. c posterior standard deviation provided by the MCMC algorithm. d Evolution of the negative log-likelihood for the 20 chains. e Evolution of the potential scale reduction factor for the 40 model parameters

Fig. 11
figure 11

a Observed pseudosection. b Predicted data computed on the model estimated by the deterministic inversion (b), on the mean model estimated by the EB (c), and on the mean model provided by the MCMC inversion (d)

Table 2 summarizes the results achieved by the deterministic and MCMC inversion. In both cases, we note that the RMSE and correlation coefficients are congruent with those obtained by the EB approach. The coverage ratio can not be determined for the deterministic inversion because this method does not provide model uncertainties. The slightly higher coverage ratio provided by MCMC demonstrates that this method gives more accurate uncertainty estimations than the EB approach but at the expense of a dramatic increase of the computational effort of more than 3 orders of magnitude. Table 2 also shows that the computational cost of the deterministic inversion is negligible in comparison to that of the MCMC sampling. From this synthetic test emerges that the EB method tends to slightly underestimate the uncertainties in the least illuminated parts of the subsurface. However, from a practical point of view, we deem that the model uncertainties estimated by the EB inversion are reasonable and comparable to those yielded by the Monte Carlo sampling.

Table 2 Quantitative summary of the deterministic and MCMC inversion results

4 Field Data Inversion

We now discuss the application to 2D field data acquired by a permanent system installed along a critical part of Parma river embankment in Colorno, Italy. In this work, we limit to invert a single dataset and we refer the reader to Hojat et al., (2019b) for more information about the study site. The electrode layout is buried in a 0.5 m-deep trench and the inversion covers an area that is 94 m wide and 14 m deep and has been discretized with rectangular cells with dimensions of 1 m and 2 m along the vertical and horizontal directions, respectively. We employ a Wenner acquisition geometry with 48 electrodes and the unit electrode spacing a = 2 m. This configuration results in 705 resistivity values to be estimated from 360 data points. The data have been corrected for the effect of the soil overlaying electrodes (Hojat et al., 2020).

Based on the available information about the study area we define a log-Gaussian prior resistivity model and a spatial variability pattern described by a Gaussian variogram with \({{\alpha }_{y}}\) and \({\alpha }_{x}\) values equal to 2 and 6 m, respectively (Fig. 12). From previous studies on this investigated site, we expect a mainly low-resistivity clay body that, at the two lateral parts of the section and around 2–3 m depth, hosts a more permeable layer constituted by sand and gravel and characterized by higher resistivity values.

Fig. 12
figure 12

a Marginal log-Gaussian prior distribution for the resistivity values in the real data application. b, c The spatial correlation functions associated with the assumed 2-D variogram model along the lateral and vertical directions, respectively

To determine the optimal number of DCT coefficients in the model and data spaces we again generate some prior realizations and compute the associated data, to evaluate how the explained variability changes as the number of retained coefficients increases. From Fig. 13a we observe that 15 and 10 coefficients along the first and second DCT dimensions, respectively, explain almost the total variability of the uncompressed model, whereas Fig. 13b illustrates that 150 coefficients recover the total variability in the data space. Therefore, in this case, the DCT transformation allows reducing the full 705D model and 360D data spaces into compressed 150D domains.

Fig. 13
figure 13

a Examples of explained model variability as the number of DCT coefficients along the 1st and 2nd DCT dimension increases. We illustrate the explained variability for a resistivity model drawn from the prior distribution. The numerical value with coordinate (x, y) indicates the explained variability if the first x, and y DCT coefficients along the 1st and 2nd DCT dimensions, respectively, are used for compressing the resistivity model. It emerges that 15 DCT coefficients along the 1st dimension and 10 along the and 2nd dimension explain almost the100% of the variability of the uncompressed resistivity models. b Explained variability as the number of DCT coefficients increases for the data associated with the model shown in a. In this case, 150 DCT coefficients express the total variability of the uncompressed data

Again, we run different tests that employ different numbers of ensemble members and we compare the different mean models and PPD estimations to select the best compromise between the computational cost and the accuracy and precision of the final results. The FE code previously used in the synthetic inversion again constitutes the forward modeling engine for the inversion process. Similar to the synthetic example, we run 5 iterations for each test.

Figure 14 shows the estimated mean models and the standard deviations for the different tests. In all cases, and as expected from previous information about the study site, we predict the two lateral high resistivity bodies around 2 m depth (associated with sand/gravel) hosted in a low resistivity medium (clay). We also note that the estimated mean model and the associated uncertainties stabilize for the tests employing 1000, and 2000 members, while the tests with only 100, 200, and 500 models provide estimated mean with some artifacts and underestimated uncertainties. However, as expected, all the tests yield uncertainty estimations in which the shallowest part of the subsurface is characterized by the lowest ambiguity while the uncertainties increase within the high resistivity formation, and at the lateral edges and the deepest part of the model due to the lower illumination. The computing times listed in Table 3 still refer to parallel Matlab codes running on the hardware configuration previously described. All tests yielded final mean models producing similar good matches with the observed data.

Fig. 14
figure 14

Estimated mean models (left) and associated standard deviation maps (right) for different tests. a 100 models in the ensemble. b 200 models in the ensemble. c 500 models in the ensemble. d 1000 models in the ensemble. e 2000 models in the ensemble

Table 3 Computing time to converge toward a stable data misfit value for the field inversion tests with different number of ensemble members

Figure 15 compares the evolution of the L2 norm data misfit for the tests employing ensembles of 200, 500, and 1000 models. Similar to the synthetic experiments we observe that just three iterations are needed to attain a stable misfit value and that a larger ensemble guarantees a faster decrease of the error in the very first iterations. From the previous considerations it emerges that in this case, the ensemble with 1000 models is the one that guarantees the best compromise between the reliability of the results and the computational effort. This test attains convergence in 16 min.

Fig. 15
figure 15

Evolution of the L2 norm data misfit for the tests with 200, 500, and 1000 models in the ensemble

Figure 16 shows some examples of resistivity sections sampled from the prior and pertaining to the test with 1000 models. These resistivity sections evolve into those shown in Fig. 17, where we observe that all the considered models univocally predict lateral high resistivity bodies located around 2 m depth.

Fig. 16
figure 16

Some examples of resistivity sections drawn from the prior distribution and associated with the test employing 1000 models in the ensemble

Fig. 17
figure 17

Some examples of models sampled at the third iteration for the test with 1000 models

To assess the quality of the results we compare the outcomes provided by the EB-ERT inversion with those achieved by a DEMC algorithm sampling the same reduced space, and with a standard deterministic least-squares inversion running in the full data and model spaces. The DEMC employed 20 chains evolving for 50,000 iterations and takes 7 days, approximately, of computing time to converge. Figure 18a, b represent the posterior mean models and standard deviation provided by the DEMC algorithm, whereas Fig. 18c shows the predictions of the deterministic inversion. The close agreement between these results and those achieved by the EB-ERT approach (Fig. 14d) ensures us about the applicability and the reliability of the proposed inversion approach. Similar to the synthetic example, we observe that the EB approach slightly underestimates the uncertainties for the model parameters poorly informed by the data. The evolution of the negative log-likelihood value and of the potential scale reduction factor for the DEMC inversion (Fig. 19), highlights that also in the reduced domain the stationary regime is reached in more than 5000 iterations, while more than 45,000 iterations are needed to achieve stable posterior assessments. Note that a much longer sampling stage and longer burn-in period would have been needed to sample the PPD in the full model space.

Fig. 18
figure 18

a Posterior mean estimated by the DEMC algorithm. b Posterior standard deviation estimated by the DEMC method. c Final model provided by the deterministic inversion

Fig. 19
figure 19

a Evolution of the negative log-likelihood for the 20 chains during the DEMC inversion. Each color represents a chain. b Evolution of the PSRF for the model parameters during the DEMC inversion. The dotted red line at 1.2 represents the threshold of convergence

These results demonstrate that the EB-ERT provides estimations comparable with those provided by the DEMC with a reduction of the total computing time of more than 3 orders of magnitudes. The deterministic solution is obtained in less than a minute, but this method does not provide neither an estimate of the uncertainty affecting the recovered model nor an ensemble of models that reproduces the acquired data.

In this field experiment, all the methods provide final model estimations that well reproduce the observed pseudosection (Fig. 20). In Table 4 we observe that the RMSE for the data is similar for EB and MCMC and slightly lower for the deterministic inversion. However, note that the predicted data associated with the deterministic approach is computed on the model that minimizes the error function, while for the EB and MCMC inversion we compute the predicted data on posterior mean models computed taking into account the entire ensemble (for EB) and all the samples collected after the burn-in period (for MCMC).

Fig. 20
figure 20

a Observed apparent resistivity pseudosection. Calculated apparent resistivity pseudosections from, b the mean model estimated by the EB-ERT inversion with 1000 ensemble members (see Fig. 14d), c the deterministic solution (see Fig. 18c), d the mean model estimated by the MCMC inversion (see Fig. 18a)

Table 4 Root-mean-square data errors for the EB, MCMC, and deterministic inversion

5 Conclusions

In this paper, we applied an ensemble-based approach to cast the ERT inversion (EB-ERT) into a Bayesian setting, where prior assumptions and reduced model, and data reparametrizations are incorporated to regularize the problem. The approach uses the ES-MDA stochastic algorithm that provides multiple realizations of the subsurface resistivity values that honor the observed data and that approximates the target posterior probability distribution. The computational cost of the implemented approach is mostly dependent on the number of ensemble members and the cost of running the forward model. The DCT compression of the data and model space mitigated the ill-conditioning of the ERT problem and also allowed running the inversion in lower-dimensional data and model spaces. This strategy consequently reduced the computational cost of the EB-ERT inversion because reliable model estimations and uncertainty assessments can be obtained with smaller ensembles.

Inversions of both synthetic and field data illustrated the applicability and the reliability of the proposed inversion approach. We demonstrated that in the context of ERT application, the implemented inversion provides final mean models and uncertainty quantifications comparable to those yielded by a much more computationally expensive MCMC sampling, although the EB tends to slightly underpredict the uncertainties associated with the parameters poorly informed by the data. In both the synthetic and filed data application, the EB-ERT approach reduced the computational effort of the Bayesian ERT inversion by more than 3 orders of magnitude with respect to the MCMC algorithm. The main advantage of probabilistic methods over standard deterministic inversions is that they provide multiple stochastic realizations that can be visualized and queried to provide meaningful information about the accuracy and the precision of the recovered solution.

The implemented approach can be applied to other nonlinear inverse problems, and it provides uncertainty estimations without requiring the linearization of the forward operator, thus avoiding the modeling errors introduced by the linear approximation. In this work, we generated the initial ensemble using the simple kriging approach, but other geostatistical methods, such as variogram-based and multiple-point geostatistics algorithms, can be included in the inversion framework for generating the initial ensemble.