A novel methodological approach for land subsidence prediction through data assimilation techniques

Anthropogenic land subsidence can be evaluated and predicted by numerical models, which are often built over deterministic analyses. However, uncertainties and approximations are present, as in any other modeling activity of real-world phenomena. This study aims at combining data assimilation techniques with a physically-based numerical model of anthropogenic land subsidence in a novel and comprehensive workflow, to overcome the main limitations concerning the way traditional deterministic analyses use the available measurements. The proposed methodology allows to reduce uncertainties affecting the model, identify the most appropriate rock constitutive behavior and characterize the most significant governing geomechanical parameters. The proposed methodological approach has been applied in a synthetic test case representative of the Upper Adriatic basin, Italy. The integration of data assimilation techniques into geomechanical modeling appears to be a useful and effective tool for a more reliable study of anthropogenic land subsidence.


Introduction
It is well known that fluid withdrawal from the subsurface involves the compaction of the reservoir rock and the consequent propagation of the deformation from underground up to the land surface. The corresponding loss of land elevation, i.e., land subsidence, may cause severe environmental problems, especially in coastland and low-land areas, such as the increase of the risk of flooding and the salinization of fresh shallow groundwater, and generally affect several vulnerable human activities from both a social and economical viewpoint, e.g., [22,48,49]. For this reason, a reliable prediction of land motion is of considerable importance, and numerical models can play a key role to estimate the amount of land subsidence caused by human activities and simulate the possible related consequences.
to improve the prediction reliability represents a natural evolution of the land subsidence modeling activity. In this context, the use of data assimilation (DA) techniques can be a robust strategy to quantify and reduce the uncertainties in the subsidence prediction by incorporating measurements and model outcomes in a proper assimilation scheme.
Originally developed especially for atmospheric and oceanographic simulations, DA strategies have been recently used also in underground applications, such as hydrology, geomechanics and petroleum engineering [4,7,16,24,29,43,54,56]. Among the most popular DA techniques used in geomechanical applications, here we mention three approaches that have proved to be particularly suitable for land subsidence analyses. The focus is specifically on Red Flag (RF), Ensemble Smoother (ES), and its variation in form of the Multiple Data Assimilation (MDA-ES). However, for a detailed review of DA techniques in geosciences the reader can for instance refer to [1,9,44], just to cite a few recent relevant works. The RF technique inexpensively provides the probability of an event following a purely Bayesian approach, with no update of the prior model [34]. This is a fast method for a preliminary analysis of an ensemble of model realizations, which can provide the most probable scenario given a number of uncertain parameters. The ES can be used to update both state variables and model parameters in a 4-D acquisition (space and time), and evaluate the related uncertainties. The ES is an ensemble-based approach that uses an approximate sampling of the posterior distribution by a finite ensemble of Monte Carlo realizations [19,51]. For the straightforward implementation and adaptability to different fields of application, this ensemble-based technique and its recent variations are increasingly employed. The ES has been initially used in petroleum engineering, especially for history matching purposes [10,11,39], with recent applications also for subsidence predictions [4,24,56]. In [23] and [56], the ES technique has been used to calibrate the geomechanical parameters in a one-way coupled model. In [2] and [4] the efficiency of ES has been demonstrated for the calibration of geomechanical parameters such as Poisson coefficient, ratio between horizontal and vertical elastic modulus, ratio between horizontal and vertical shear modulus and ratio between I and II cycle compressibility. In [55,56] and [58] ES has been used in real-world applications to constrain a geomechanical model through land surface and/or compaction measurements. Finally, the MDA-ES algorithm should help improve the ES outcome, especially when the relation between model solution and parameters is non-linear. In [16] the available observations have been assimilated multiple times with an inflated measurement covariance matrix. This technique has been combined with ES approach (MDA-ES) in [17] and used in three synthetic history matching problems. A MDA-ES algorithm was also applied by [24] to estimate the reservoir compaction coefficient and the subsurface basement elastic modulus, thus improving the ES effectiveness in non-linear problems.
The objective of this paper is to develop a robust step-bystep workflow for the prediction of land subsidence above producing hydrocarbon fields, integrating well-established DA techniques with an appropriate uncertainty evaluation. In order to overcome the main limitations of traditional deterministic approaches, we aim at looking at the geomechanical modelling of land subsidence from a more comprehensive perspective. To this purpose, the proposed methodological approach automatically combines the available measurements with the mathematical and numerical description of the main undergoing processes. The workflow is based on successive steps of DA application, such as RF, ES and MDA-ES, requiring an increasing level of complexity and computational effort, but providing more detailed and refined outcomes. Prediction uncertainties are progressively reduced as new measurements become available, with the model gradually trained and updated as the knowledge of the actual phenomenon improves. The final result is a substantial increase of the prediction reliability.
In order to investigate the capabilities and possible drawbacks of the proposed approach, an extensive numerical experimentation has been carried out by considering a synthetic test case, which is, however, representative of an offshore real-world hydrocarbon reservoir. The effectiveness of the selected DA techniques has been tested in different possible scenarios of subsidence modelling, both changing the combinations of uncertain geomechanical parameters and constraining the model outcomes with a different number and type of synthetic measurements. The numerical results allow to characterize each step of the workflow and demonstrate that the integration of DA approaches in subsidence problems can be effective to both avoid an overconfidence in deterministic model applications and reduce the prediction uncertainties. In addition, they can suggest the possibility of easily extending and generalizing the application of the proposed methodology to other settings and modelling targets.
The paper is organized as follows. First, the forward numerical model used to analyse land subsidence due to a compacting reservoir is briefly described, and the investigated DA methods are introduced. Then, numerical results for a synthetic test case with two different constitutive laws are presented and discussed. On the basis of the numerical results, a tentative novel workflow for land subsidence modelling is introduced, thus closing the paper.

Methodological approach
The methodological approach is based on the idea of building a model for land subsidence management that: (i) is progressively trained by the assimilation of new data, (ii) reduces the prediction uncertainty as the production program proceeds, and (iii) is able to react almost in realtime to modifications in the reservoir development.
The candidate DA approaches require the generation through the forward model of an ensemble of realizations allowing to propagate the uncertainties from the input parameters to the problem state variables, e.g., land surface vertical displacements. The augmented ensemble, including both state variables and input parameters, is used with the DA techniques, in order to reduce the uncertainties and make the model results more reliable.
In the next sections, the geomechanical model is first introduced in general, and then each DA technique included in the workflow is described. The model presented herein represents a consistent option to investigate the efficiency and feasibility of the workflow for land subsidence analysis, but neither the only, nor necessarily the best, choice among the possible alternatives. The proposed approach can be generalized to other contexts as well, using different models, sources of uncertainties and types of available measurements.

Numerical forward model
Land subsidence due to fluid withdrawal is classically modeled by means of Biot's theory of consolidation [5], where variations of source of strength reflect on the pore fluid movement and consequently on stress and displacement field of solid skeleton. To study the anthropogenic land subsidence due to compaction of a deep hydrocarbon reservoir, a flow and a geomechanical model are necessary.
The flow model controls the dynamics of the hydrocarbon reservoir and the connected aquifer system. It is based on Darcy's law coupled with the continuity equation. The generalized multiphase flow model is: where ∇· and ∇ are the divergence and gradient operator, respectively, κ is the permeability of the porous medium, ρ α the density of the fluid phase α and μ α its viscosity, p α is the fluid pore pressure that is a function of the position vector x and the time t, φ is the porosity and S α the saturation index for the fluid phase α. Land settlements are computed by a geomechanical model that is based on the solution of the equilibrium equation of a 3D porous volume with Terzaghi's effective stress principle [6,50]: where σ (x, t) and 1 are the effective stress and the identity tensor, respectively, b is the Biot coefficient andp (x, t) is the equivalent pore pressure: The system of Eq. 2, supplemented with appropriate boundary conditions, is numerically solved by means of a finite element formulation. The effective stress tensor σ (x, t) is generally linked to the displacement field by an appropriate constitutive relationship: whereD is the rank-four constitutive tensor, ε (x, t) is the strain tensor and ':' indicates the inner product. The strain tensor depends on the displacement field u(x, t) according to the small strain hypothesis: where ∇ s is the symmetric gradient operator. Introducing Eq. 3 to Eq. 5 into Eq. 2 and prescribing the appropriate boundary conditions leads to the definition of the geomechanical model in the main unknown u(x, t). The selection of the generally non-linear operatorD is the key for the model outcome. In this application, we will consider two different non-linear constitutive models, which can allow for a reliable description of the mechanical behavior of geological formations: (i) the elasto-plastic modified Cam-Clay (MCC) and (ii) the visco-elasto-plastic (VEP) Vermeer-Neher law. For a thorough discussion on these constitutive models, the reader is referred to the relevant literature, e.g. [13,52]. The governing material parameters that appear in the definition ofD for each law will be introduced in Section 3.1. A one-way coupled approach is considered in our simulations, as is fully warranted by the time and space scale of interest. In particular, such an approach is usually employed in the context of real-world oil and gas reservoir engineering applications, where the mechanics-to-flow coupling is usually weak, e.g. [28,37,53]. As a consequence, the distribution of the pore pressure variation in the reservoir and the connected aquifer is firstly predicted by the flow model (1) and then the results are introduced in the geomechanical model. In this work, an in-house developed code is used. Details on the numerical formulation and implementation are given in [27,31,41]. This code has been widely validated and used in several real-world applications, e.g., [22,31,46,47,55].

Data assimilation techniques
While the outcome of numerical models is obtained in a deterministic way, every computational tool describing realworld phenomena is based on a number of assumptions and approximations. Integration of DA techniques allows to consider and reduce the uncertainties implicitly introduced into numerical models constraining them with a set of measurements of the simulated process. In this work, we focus on the geomechanical model and assume the pore pressurep (x, t) to be known as the result of a historymatching process. Though the pressure field has its own uncertainty, in-situ well measurements coupled with the history-matching process typically reduce the size of the confidence interval, such that we can considerp as a deterministic value with respect to other uncertain parameters.
DA techniques are included into the structural model to account for and reduce uncertainties related both to the definition of the rock constitutive laws and the geomechanical parameters. After a preliminary analysis of the ensemble, denoted as model diagnostic, three main approaches are investigated and combined: RF, ES and MDA-ES. Each method is briefly reviewed in the next paragraphs focusing on the problem that follows. Generally, the vector of the true state variables in both space and time ψ t ∈ R n ψ can be defined as a function of the vector of the true model parameters θ t ∈ R n θ through the forward operator G: ψ t = G θ t with n ψ and n θ the number of states and parameters, respectively. Focusing on the land subsidence problem, the operator G is the geomechanical simulator that relates the true state variables, such as land settlements, to the true model parameters, for example the geomechanical parameters that control the rock behavior. The vector ψ t is related to the vector of noisy empirical measurements d ∈ R n d , with n d the number of measurements, through the generally non-linear relationship: (6) where H is the measurement operator mapping from ψ t to the true observable vector d t ∈ R n d and ε d ∼ N (0, C d ) ∈ R n d is the measurement error, with C d ∈ R n d ×n d the covariance matrix of measurement error. The matrix C d depends on the kind and accuracy of the available measurements.
The vector of the model state ψ ∈ R n ψ is related to ψ t by the relation: ψ = ψ t + ε ψ (7) where ε ψ ∼ N 0, C ψ ∈ R n ψ is the unknown error in the model states with C ψ ∈ R n ψ ×n ψ the covariance matrix of the model state.
In the following, ϕ ∈ R n ϕ ×n mc denotes the matrix of augmented ensemble of states ψ ∈ R n ψ and parameters θ ∈ R n θ and C f ∈ R n ϕ ×n ϕ the related covariance matrix, with n ϕ the sum of n ψ and n θ , and n mc the size of the Monte Carlo ensemble. The superscripts prior and update designate the forecast and update ensembles, respectively.

Model diagnostic
A model diagnostic procedure can be initially carried out to validate the ensemble of model realizations with respect to the synthetic measurements. The main objectives of the validation are to verify if the assumptions are violated, if the uncertainties are underestimated and if too much confidence is given to the 'a priori' model [45]. A way to validate the forecast ensemble is the χ 2 -test, as proposed in [24]. This approach is based on the mismatch between model results and observations, that is [36]: For linear problems, the minimum of J (θ) allows for a χ 2 distribution with degrees of freedom equal to the number of measurements n d [45]. Consequently, it is well recognized that an ensemble ensures better results in DA applications when This could be used as a general guideline to define a range for the acceptable size of J (θ) in non-linear inverse problems [11]. Generally, the data mismatch part of the cost function in Eq. 8 dominates the magnitude of the total function J (θ), so the first contribution of Eq. 8 can be neglected [11].

Red Flag
RF, as introduced in [34], is a statistical technique that computes the probability of an event by combining prior information with the likelihood of the measurements. In this work, RF approach is used as a preliminary method to define the probability of every Monte Carlo realization in the prior ensemble. This allows to evaluate 'a priori' the model uncertainties without solving the inverse problem, thus providing a first estimate of the combination of parameters which is most likely to occur. The Bayesian probability P ψ k |d of a particular realization k of the ensemble is: where P ψ k is the prior probability of the realization, P d|ψ k is the associated likelihood of the measurements, and the denominator is a normalizing factor [34]. To define the likelihood, the measurements must be compared with the model outcome calculated at each time and space location for the realization k. Considering a Gaussian distribution for the likelihood and introducing a cost function I k , the likelihood reads [34]: where q is the vector of the differences between measurements and model results in both space and time.

Ensemble smoother
ES is a non-sequential DA algorithm originally proposed by [51]. It is a variance-minimizing estimator that combines prior information, measurements, and the solution of the forward model to update model states and parameters. With ES, the solution of the inverse problem can be written in a matrix form as: ϕ updat e = ϕ prior + K D − H ϕ prior (13) where D ∈ R n d ×n mc is the matrix of measurements and K ∈ R n ϕ ×n d the Kalman gain, calculated as: With a slight notation abuse, here H denotes the same mapping operator from model to the observational space as defined in Eq. 6, but extended to the vector of state variables augmented by the parameters.
The analyses that follow are carried out with the ES implementation according to the approach introduced in [18]. Results are optimal when the probability distribution of uncertain parameters is Gaussian [51]. The quality of the outcome of the ES application can be evaluated by comparing the prior and update ensemble through two performance indices, the average absolute error (AE) and the average ensemble spread (AES) [29]: where ϕ i,k is either the prior or the posterior value of the k-th realization for the i-th observation or state, ϕ true i is the true reference value and ϕ mean i is the ensemble mean. These metrics were already used in other works as well, e.g., [56,59]. The AE index is a measure of the algorithm capability to approach the truth, as it compares the model outcome with the true reference value. This index can be computed only in a synthetic case where the true reference is known. The AES index accounts for the deviation of the model results from the ensemble mean. Hence, it provides an indication of the spread of the distribution, i.e., it is a measure of the confidence in the predicted value. Generally, results of the assimilation are satisfactory when AE and AES of the update ensemble decrease with respect to the corresponding indices of the forecast ensemble. For this reason, the relative variation J of these indices from forecast to update is also computed: where ζ is either the AE or AES index.

Multiple data assimilation
MDA technique was originally introduced by [16] to improve the results of history-matching problems and was later combined with ES approach [17]. Basically, in MDA-ES the ES analysis is repeated for a definite number of 'iterations' to improve the results of a single assimilation. The covariance matrix of the measurement error C d is multiplied by an inflation coefficient α k ≥ 1 for every 'iteration' k to avoid an over-confidence in the available measurements. The choice of α k is such that: This condition makes MDA-ES equivalent to ES for Gaussian-linear problems [15]. The selection of α k is still a matter of debate, with several different proposals currently advanced by some researches, e.g. [14,38].

Test case analyses and results
In this section, the test case is introduced and numerical results are presented. Analyses are performed on a synthetic test case, which is, however, based on the structural and mechanical properties representative of an off-shore realworld hydrocarbon reservoir buried in a sedimentary basin, such as the Northern Adriatic, Italy. Among the several different uncertainties possibly affecting a land subsidence model, we focus on those associated to the constitutive behavior and the governing deep rock mechanical parameters, which most affect the expected ground motion. We consider feasible ranges for each uncertain parameter based on the physical experimental evidence and compute ensembles taken from the parameter distributions for two different constitutive laws. Then, such uncertainties are propagated on the state variables of the problem through the forward geomechanical model. The augmented ensemble, including both state variables and model parameters, is used with the DA techniques. To constrain the model, we use two different types of observations: (i) surface vertical displacements, and (ii) deep compaction measurements at the reservoir depth. In particular, we suppose the former to be recorded by a GPS station, while the latter derive from the radioactive marker technique.
First, geometry and properties of the synthetic model are described, with the outcome obtained for the combination of parameters we assume as the 'true' configuration. Then, an extensive numerical experimentation is carried out and discussed.

Synthetic test case
We consider a synthetic test case representative of an offshore hydrocarbon reservoir buried in a sedimentary basin. To make the model configuration as much realistic as possible, geometry and properties typical of the Northern Adriatic basin, Italy, have been used. Two different possible constitutive laws, namely a MCC model [13] and a VEP model based on Vermeer-Neher theory [52], are associated to the reservoir and the hydraulically connected aquifer. The first constitutive law describes an elasto-plastic rateindependent model, the latter is rate-dependent and accounts for viscous effects. These two models are both non-linear and representative of the possible behavior of a deep porous volume subject to pressure variations [30,41]. Since the behavior of underburden and overburden does not significantly affect surface movements [22,46], for the sake of simplicity, they are characterized by a simple and deterministic linear elastic constitutive law.
The model domain covers an area of 50 km by 50 km and extends down to about 5 km depth (Fig. 1). The reservoir is located in a central position at a depth interval between 1038 m and 1075 m. The domain is discretized into a 3D finite element mesh, which consists of 71,734 nodes and 410,030 tetrahedrons. A part of the mesh showing the aquifer and reservoir is presented in Fig. 1a. We assume a homogeneous Poisson coefficient equal to 0.30 everywhere and a unitary Biot coefficient. The vertical compressibility c m is assumed to follow the hysteretic law vs the effective vertical stress σ z developed by [3,20] for the Northern Adriatic basin, Italy: (20) where c m and σ z are in [MPa −1 ] and [MPa], respectively. Homogeneous null boundary conditions are prescribed for displacements, on the lateral and bottom boundaries, and for stress, on the top surface.
The pore pressure variation due to hydrocarbon extraction is applied in the reservoir and aquifer. The pressure history shown in Fig. 2 is prescribed on reservoir elements to simulate a possible program of hydrocarbon production. In order to take into consideration a scenario as realistic and general as possible, we simulate two production phases (the first three years and from year 5 to year 7) divided by a temporary stop of the extraction. At the end of the production program, a natural pressure recovery is supposed. The pressure variation propagates from the reservoir to the hydraulically connected aquifer and is considered deterministic, as discussed above.
Since we consider a synthetic case, we run the forward model with a set of parameter values, that are considered as the 'true' configuration, to define hypothetical measurements. In order to simulate real recordings and avoid a trivial application, the outcome of the geomechanical model at the 'measurement' location is properly perturbed to get the assimilation data, that are for example the dots in Fig. 3. Characteristic size and correlations of the synthetic measurement error will be clarified in Section 3.1.1.
The material parameters required by both the MCC and VEP constitutive laws are the modified compression index λ * and the modified swelling index k * , while the modified creep index μ * and the geotechnical initial overconsolidation ratio R are needed only by the VEP model. Specifically, their physical meaning is as follows: λ * is the slope of the normal consolidation profile in the plot of volumetric strain vs axial stress in natural logarithmic scale, i.e. it is a parameter that mainly controls the rock behavior in I-cycle conditions; -k * is the slope of the unloading profile in the same plot as λ * , thus it mainly impacts on II-cycle conditions; -μ * represents the slope of the volumetric strain profile vs time in natural logarithmic scale, so its value is a measure of the delay between the pressure variation and the related reservoir and land deformation; -R is the ratio p c,r,0 /p c,0 , where p c,r,0 is a function of the maximum volumetric stress ever experienced by the material before loading and p c,0 depends on the volumetric stress state at initial conditions. The parameter R is required to describe implicitly the yield surface evolution in the VEP model.
All material parameters listed above are dimensionless. For more details on the model properties and implementations, see [30,35,41].

The parameter space
The 'true' configuration is selected as follows: Figure 3 shows the vertical displacements in time for the 'true' configuration, where the maximum vertical motion is recorded. As it can be easily seen, the difference between the MCC and VEP behavior is not only related to the displacement values, but also to their evolution in time. The VEP model is typically characterized by a delay between the variations of fluid production rates and the resulting land Squares and circles are an example of such outcomes perturbed to get the synthetic measurements, respectively for the MCC and the VEP constitutive law settlement [30], which is not accounted for with a traditional MCC approach. These displacements, that are the result of the geomechanical model, have been perturbed with an error ∼ N (0; C d ) to get the synthetic measurements assimilated in the procedure. A detailed description of the construction of C d is presented in the lines that follow.
To take into consideration the uncertainties that affect the geomechanical model, we define a variability range for the parameters above. Uncertainty on the first two parameters is related to the confidence interval of c m in Eqs. 19 and 20, as described in [3], while the ranges of μ * and R are chosen according to typical values reported in the literature [30,52]. Thus, the following statistical distributions of the parameters are used in our tests: i.e., λ * and k * are Gaussian parameters in a log-scale, while μ * and R have a uniform distribution. We assume the parameters to be constant over the portion of the domain where a pressure variation takes place. The reason for this assumption is twofold: (i) the variability in space of the geomechanical parameters is usually much lower than that of the hydraulic parameters. In particular, in sedimentary basins such a variability is expected to take place more along the vertical direction than in a horizontal plain. However, in this application the reservoir and aquifer thickness is relatively small to present a significant geomechanical heterogeneity in λ * , k * and μ * . Nonetheless, we recall that the uniaxial compressibility c m is not constant and is varying with depth as a function of the vertical effective stress σ z ; (ii) the available measurements over off-shore reservoirs usually reduce to displacement time series over a small (1 or 2) number of points, and this is not sufficient to effectively characterize the spatial heterogeneity of the uncertain parameters [58].
We suppose that a single GPS station located at the reservoir center collects measurements of the vertical displacement in time, while compaction measurements at the reservoir depth are recorded by radioactive markers. We assimilate displacement measurements every three months and compaction measurements every three years. The covariance matrix of measurement error C d has to be defined. Uncertainties related to observations can be computed as the sum of a measurement and an idealisation noise, according to [33]. The first component depends on the accuracy of the tool used for the measurements and the representativeness of the mathematical model for reproducing the observations, and affects only the diagonal entries of C d . The idealisation noise affects both the diagonal and the extra-diagonal entries according to the spatial and temporal correlation among measurements. In subsidence modeling, any deformation caused by sources other than the deep compacting layers should be considered as noise. Therefore, all signal components in geodetic observations that are not related to the signal of interest are treated as a noise and included in the idealisation part of C d . Following the fractional Brownian motion approach [33], the idealisation noise terms are computed as: where t i is the time of the measurement i, ς the standard deviation and p the Hurst index. In real-world studies, parameters ς and p derive from specific studies on the measurement quality. Here, for the vertical displacements recorded by the GPS station, they have been computed through a fitting process of the variogram of the model outcome run with the 'true' parameters ( Fig. 3), as described in [33]. For the selected constitutive laws, such parameters read: The measurement noise is defined from a normal distribution with zero mean and standard deviation equal to 1.5 mm [33]. On the contrary, compaction measurements are considered uncorrelated. To define C d , the measurement noise is computed from a normal distribution with zero mean and standard deviation equal to 1.0 cm.
To analyze the role of each parameter, we first investigate the potential of the proposed DA techniques to reduce the uncertainty on each single parameter distribution individually, then we consider some of their combinations. Table 1 lists all the analyzed configurations. A total number of 12 sets is considered, 3 referring to MCC constitutive law and 9 to VEP behavior.

Model diagnostic
The ensemble of model realizations is initially validated with respect to the synthetic measurements with the model diagnostic procedure. Here, the model state ψ of Eq. 8 includes the synthetic displacement measurements resulting from the geomechanical model at the GPS location. The magnitude of J (θ) is calculated for every realization of the displacement ensemble. Each ensemble includes 50 realizations created running the forward model with the probability distributions for the uncertain parameters described in paragraph Section 3.1. Then an average value of χ 2 for the entire ensemble is defined. The resulting values are shown in Table 1.
The χ 2 -test provides only a qualitative analysis of the ensemble and it is not possible to identify a clear trend.
Most of the values are close to one. Despite of this, when the number of uncertain parameters grows, the χ 2 value is higher, as in set # 3 for the MCC constitutive law and in the last five sets of Table 1 for the VEP behavior. Higher values of χ 2 denote a probable increasing difficulty in constraining the model with DA techniques. Vice versa, if χ 2 is smaller than one, the variance associated to the ensemble is lower than the variance associated to the measurements, meaning that DA techniques cannot provide improvements in the update of the ensemble and is therefore likely that DA processes are ineffective. Generally, the constitutive law that describes the behavior of the porous medium is 'a priori' unknown. However, a guess could be obtained from available information, e.g., laboratory tests, in-situ measurements or data provided by other reservoirs in the same basin. The χ 2 -test can be also used as a preliminary tool to select the most appropriate constitutive law for the geomechanical model, that is the law which appears to be more representative from the available measurements. As an example, we consider cases with the uncertain modified compression index λ * and modified swelling index k * , with the related forecast ensembles. Results are shown in Table 2. We compute the χ 2 value when the observations are obtained with the same constitutive law of the ensemble and when they are computed with a constitutive law different from the one used to generate the ensemble. The latter test allows to mimic the case where the choice of the constitutive law in the modeling workflow is not consistent with the actual physical governing process.
As expected, the χ 2 value associated to the ensemble created with the same constitutive law of the measurements is always closer to 1 than the inconsistent one. Consequently, χ 2 -test can provide a useful and cheap preliminary screening of the generated forecast ensembles.

Red flag
After the ensemble diagnostic, the RF approach is carried out for every configuration shown in Table 1. Every realization of the forecast ensembles is characterized by its own probability using Eq. 10, where P ψ k is derived from the joint probability of the parameter set used to define the realization. Table 3 provides for each case listed in Table 1 the realization characterized by the largest probability  of occurrence, along with the corresponding parameter values. The configuration with the largest probability is not always the one closest to the 'true' parameter set.
The reason is that we compare the state ensemble with the vertical displacements, which are the 'effect' through the geomechanical model of the parameter selection and not directly parameters themselves. In other words, we are considering the image of θ k through the forward model G and the RF approach computes a 'score' for each of such images. In some cases, more than one set provides displacements close to the observation and consequently has a high probability. This could suggest a possible overparameterization of the problem, i.e. different combinations of the parameter values can be able to reproduce similarly the available observations. This is usually an unfortunate situation, where DA is expected to have a low effectiveness on the parameter characterization. Despite this, in most cases, there is an important difference between the largest and the smallest probability. Therefore, RF can help to preliminary reduce the uncertainties by neglecting the realizations characterized by the smallest probability of occurrence.

Ensemble smoother
ES has been employed to find a tentative solution to the inverse model and reduce the initial uncertainties on the model prediction. Both state and parameter ensembles are updated by constraining the model with available measurements. The quality of the outcome is evaluated through the indices AE (Eq. 15) and AES (Eq. 16) and their variation (Eq. 17). Note that in the computation of such indices, the matrix of updated state variables refers to the solution of the forward model by using the updated parameter vector.

Parameter constraints
For the test cases of Table 1, the outcome of the ES approach is provided in Table 4. A positive value of J indicates a restriction around the true configuration of the updated ensemble with respect to the prior. It is clear that the assimilation effectiveness strongly depends on the uncertain parameter set.
The choice of the set of uncertain parameters has a significant impact on the model in relation to the geometry of the domain, the constitutive law, the observed data, the boundary conditions and the imposed external load. For example, the uncertainty associated to the modified compression index λ * and the modified swelling index k * has a similar range of variability, as defined in Section 3.1. Nevertheless, the corresponding state ensembles for the MCC constitutive law (set # 1 and # 2) have very different forecast AES. A variation of k * in the model does not provide significant changes in the resulting state in term of land vertical motion, u z . As a matter of fact, k * is a parameter mainly controlling the rock behavior in II-cycle conditions, so it appears to play a secondary role for the selected production program (Fig. 2a). Consequently, in this case ES cannot help constrain k * because variations related to the state ensemble are lower than the errors associated to measurements. By distinction, variations in λ * produce a significant change in the forecast state ensemble that provide enough information to constrain the model outcome close to the observations, with a reduction of the 'a priori' parameter uncertainties.
ES has been applied considering uncertainty related to each parameter individually and some of their combinations to evaluate ES capability of conditioning the model in different situations with the same available observations. The results of Table 4 point out that an over-parameterized problem can prevent from constraining the model to the measurements, as in the last four parameter sets. For the VEP behavior, this can be seen when the set of uncertain parameters includes both λ * and the ratio R (set # 9). In this case measurements are not enough to constrain the model, while results of set # 4 and # 7, in which these parameters are separately considered into the ES approach, provide a satisfactory reduction of the ensembles spread around the true configuration. Table 4 suggests that the most satisfactory results can be obtained in the configurations with uncertain λ * , sets # 1 and # 4 for MCC and VEP behavior, respectively. For these cases, we evaluate the ES capabilities also in a predictive sense by assimilating 3, 5 and 7 years of displacement measurements only, with respect to the assimilation of observation data during all the simulation period, i.e. 10 years. This is to investigate the use of ES during the productive life of the reservoir, introducing into the DA framework the up-to-date set of measurements. The resulting forecast and update ensembles are shown in Fig. 4 together with the variation J with respect to the forecast ensemble of the state variable, i.e. vertical displacements. As the number of assimilated measurements increases, the effectiveness of the ES algorithm grows. Using a MCC model, the assimilation of few observation data enables the ES to reduce significantly the model uncertainties. In the case of VEP behavior, the assimilation of three years of measurements (Fig. 4b) is not enough for constraining the model, because the spread of the forecast state ensemble (AES = 0.020) is already smaller than the one for MCC (AES = 0.092) and so the model characterization is intrinsically more difficult. Nevertheless, seven years of observation data appear to be enough to reduce uncertainties and improve the model predictive capabilities. In other words, the continuous assimilation of new land displacement observations in time appears to be effective for 'training' automatically the model, thus increasing its reliability in the prediction capability.

Influence of measurement covariance matrix
The results presented in Table 4 and Fig. 4 are obtained with the covariance matrix of the measurement error C d described in Section 3.1. Reduction in observation errors, i.e. increase in measurement reliability, usually implies a better performance of the DA algorithms [4]. To evaluate the importance of the measurement error, we reduce the standard deviation ς in the computation of C d , i.e. ς 2 equal to 1.112 · 10 −5 m 2 /yr p for MCC constitutive law and to 4.158 · 10 −6 m 2 /yr p for VEP model. Results provided in Fig. 5 show the outcome of the ES algorithm for the estimate of the modified compression index λ * by assimilating displacement measurements for the first five years of the simulation process. These results, compared to those obtained with the original C d shown in Fig. 4c, d, 5a and d, point out a clear improvement in the solution of inverse problem with an increase of J by over 61%.

Assimilation of compaction measurements
Synthetic measurements of the formation compaction, ideally measured for instance by the radioactive marker technique, have been assimilated. We suppose to place a marker borehole near the GPS location, where the reservoir is approximately 30 m-thick. Since the initial space between two adjacent radioactive markers is approximately 10 m, a maximum number of three compaction measurements are available for the assimilation. To evaluate the influence of these additional measurements, we consider four significant configuration sets of uncertain parameters (set # 1, # 4, # 6 and # 12). Figure 6 shows the variation J of the AE and AES indices from the forecast to the update ensemble. It is the result of ES application when both displacement and compaction measurements are assimilated. Comparing Fig. 6 with the results in Table 4 allows to understand when adding more accurate measurements with a different nature could be useful.
First, set # 1 and # 4 are considered, with uncertain λ * for the MCC and VEP law, respectively. In these cases, the assimilation of only GPS measurements allows to constrain the model. Adding other observations does not provide a relevant improvement in the ES application. Then, we consider set # 6, when the modified creep index μ * with the VEP model is uncertain. This parameter has a low range of variability. In this case, the assimilation of both displacement and compaction measurements with the covariance matrix as described in Section 3.1 proves a significant variation in the ES results, with an improvement of J of about 107% with respect to the assimilation of vertical displacements only. Finally, we evaluate the case of an over-parameterized problem, i.e. set # 12 of Table 1. Results obtained with the assimilation of both GPS and marker measurements cannot be considered thoroughly satisfactory, because of the small, or even negative, values of J. Nevertheless, they are better than results in Table 4 for the assimilation of surface vertical displacements only, at least for parameter λ * , ratio R and displacement u z .

Multiple data assimilation
MDA-ES can help improve the effectiveness of the ES algorithm, especially when there is a strongly non-linear relationship between the uncertain parameters and the state variables [16]. To avoid an over confidence in the observation data, the covariance matrix of measurements error is multiplied by an inflation coefficient α k at every assimilation step k. Different algorithms have been recently developed to define the sequence of α k values, e.g. in [14]. Here, we consider only the simplest option, i.e. a constant inflation coefficient equal to the total number of successive assimilations.

Influence of measurement covariance matrix
Application of MDA-ES for the test cases of Table 4 improves the parameters estimation with respect to ES only when uncertainty is associated to a single parameter. For the sake of brevity, here these outcomes are not provided. Better results have been achieved considering a reduced covariance of the measurement error. Two groups of simulations are carried out to point out the influence of the measurement error in the MDA-ES approach. The first with the measurement covariance matrix as described before in Section 3.1 and the second with a reduced one.
In the latter, we consider again Eq. 21 with the parameters properly changed to obtain a much smaller final error. In particular, the entries of the measurement noise are sampled from a normal distribution with zero mean and standard deviation equal to 1 mm, while the terms of the idealisation noise have been calculated considering Hurst index p equal to 1.67 and variance ς 2 equal to 2 · 10 −5 m 2 /yr p . Figure 7 shows a comparison between ES and MDA-ES for the estimation of parameter set # 1 considering the original covariance matrix, while in Fig. 8 the reduced one is used. The forecast and update parameter ensembles for a single ES application and three MDA-ES assimilation of vertical displacements are represented. In the first case ( Fig. 7a and b), i.e. for the original covariance matrix, the final update ensembles of the two approaches appear quite similar, except for a few outliers. Conversely, for the reduced covariance matrix (Figs. 8a and b), progressive improvements at every successive assimilation are registered and the final ensemble is more clustered and centered around the true value than the one derived from a single ES assimilation. The latter behavior is the one expected in applying the MDA-ES approach. In the numerical experimentations that follow, the reduced measurement covariance matrix is used.  Table 4 when both displacement and compaction measurements are assimilated for some significant sets of uncertain parameters, i.e. sets # 1, # 4, # 6 and # 12. In brackets, values of the variation J of AE and AES indices when only the displacement measurements are assimilated (the same as in Table 4) are reported (16) (62) (42) (71) (

Comparison between ES and MDA-ES for parameter constraint
In this section, MDA-ES is compared to ES with a reduced measurement covariance matrix, as explained in Section 3.5.1. Set # 9 formed by uncertain λ * and the ratio R and an additional set, set # 13 with λ * and μ * uncertain, are considered. The sets of Table 4 including the modified swelling index k * have been discarded because of its low influence for this specific reservoir problem, as discussed  Figures 9 and 10 show the comparison between ES and MDA-ES for set # 9 and set # 13, respectively. Similarly to the outcome presented in Fig. 8, Fig. 9 shows that during the MDA-ES process progressive improvements are achieved and the final result is better than the one of a single ES assimilation for the constraining of λ * . By distinction, if the set of parameters is formed by λ * and μ * (Fig. 10), MDA-ES does not provide any improvement with respect to the ES results. This is due to the difficulty in the μ * estimate by using vertical land displacements only, as already observed in Table 4. Similar results are therefore expected for the sets with three and four uncertain parameters. (c) (d) Fig. 10 The same as Fig. 9 for set # 13, i.e. for the estimation of the modified compression index λ * (a, b) and the modified creep index μ * (c, d)

Discussion and conclusion
In this work, we analyze the effectiveness of the integration of DA techniques into numerical models for land subsidence prediction above producing hydrocarbon reservoirs. The aim is to define a modern methodological approach able to take into consideration, quantify and reduce uncertainties in land subsidence prediction by a progressive 'training' of the forward numerical model through the assimilation of the available pieces of information. The potential and drawbacks of the different DA approaches have been investigated in a synthetic test case, representative of a real-world hydrocarbon producing reservoir buried in a sedimentary basin. Selected mechanical behaviors and parameter ranges are typical of the Northern Adriatic basin, Italy. On the basis of the numerical experimentation carried out in the previous sections, the following main considerations are worth summarizing.
1. Preliminaries. Land subsidence models, as any other numerical model of real-world processes, are affected by a number of different sources of uncertainties. The first preliminary step consists of recognizing the most influential uncertain factors and, among these, the ones that can be explicitly included as stochastic variables in the construction of the model ensembles. For instance, uncertainties in the subsurface geometry or in the mathematical description of the governing processes cannot be easily quantified, while it is often convenient, and more supported by available information, to treat some material parameters as stochastic variables. Uncertainties in the geometry or the mathematical modeling can be implicitly accounted for by artificially inflating the errors associated to assimilated measurements. Then, the appropriate selection of the set of uncertain material parameters is a fundamental aspect for the application success. A necessary condition for an effective DA is that the uncertain material parameter has a significant impact on the monitored model outcome, i.e. ground motion in this case, while including parameters with a low relevance is often detrimental for the quality of the overall assimilation process. For this reason, a preliminary sensitivity analysis on the relative influence of each material parameter on the model outcome, along with the identification of feasible variation ranges, is of paramount importance. For instance, such sensitivity analysis can be quantitatively performed by mean of Sobol's indices [40], as recently done in [25,57,59] In our tests, there is usually an important difference between the highest and the lowest probability, with a relatively small number of realizations with a relevant probability of occurrence. Some exceptions to this outcome has been found for the cases in which χ 2 is lower than one, i.e. when the ensemble should have been rejected a priori in the diagnostic stage. RF provides a preliminary idea of the most likely parameter combination and allows to exclude some realizations that can be considered too unrealistic. In this way, a refinement of the feasible ranges for the uncertain parameter set can be performed, with the possibility of building more representative model ensembles.
Of course, the most likely parameter combination along with the expected ranges can change as new observations become available, hence such a refinement should be done with some caution, especially when the amount and quality of such data is limited. 4. Assimilation. The assimilation stage allows to incorporate the available measurements and progressively train the geomechanical model as the monitoring of the ongoing process proceeds. The outcome of this stage is a new updated ensemble with a progressively smaller uncertainty in the model prediction. To this aim, we have elected to employ the ES and MDA-ES techniques. ES is a non-sequential DA approach that provides an update ensemble where both the state variables, i.e. surface motion in this case, and the material parameters are constrained by the observation data.
The ES application mainly depends on three different aspects: (i) the set of uncertain parameters, (ii) the measurements, and (iii) the error associated to the observations, i.e. ultimately the definition of the measurement covariance matrix. First, the uncertain parameter set has to be actually relevant for the observed process without leading to over-parametrization, i.e. multiple combinations can provide similar results with respect to the measurements. Generally, if the choice of the uncertain parameter set is consistent with the available observations, the ES algorithm appears to be especially suitable for subsidence predictions, with a progressive model improvement as the quantity of assimilated measurements increases. Of course, the effectiveness of ES application improves with the decrease of the error associated to the observations. Hence, the definition of the covariance matrix of the measurement error, which might be also artificially inflated to account for other sources of uncertainty, plays a fundamental role in the application of the DA algorithm. Anyway, the results obtained in the tests investigated herein suggest ES to be a promising tool to quantify and reduce the uncertainties in land subsidence predictions. MDA-ES uses multiple ES applications with an inflated covariance matrix of the measurement error. In principle, it can be used to improve the ES outcome, especially in case of a strongly non-linear relationship between state variables and uncertain parameters. In our numerical experiments, MDA-ES does not always provide better results than ES, despite the higher computational cost, and appears to be strongly influenced by the selection of the uncertain parameter set and the covariance matrix of the measurement error. Its use in real-world reservoir applications should be better investigated in additional test cases.
On the basis of the considerations above, a tentative workflow for modern land subsidence simulations above producing hydrocarbon fields can be defined. In the context of a synthetic test case, the investigated procedure, consisting of points 1 to 4 above, allows to take into consideration most of the uncertainties that affect the numerical modeling of land subsidence above producing hydrocarbon reservoirs, and then quantify and reduce them as the reservoir development proceeds and new monitoring data are incorporated in the overall model. Further experiences are necessary by investigating real-world producing reservoirs and assimilating real observation data. These experiences, which are currently underway, will allow for refining the steps 1-4 above and help define a 'protocol' to be followed for stateof-the-art and reliable predictions of land subsidence.
Funding Open access funding provided by Università degli Studi di Padova within the CRUI-CARE Agreement.

Conflict of Interests
The authors declare that they have no conflict of interest.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons. org/licenses/by/4.0/.