Sensitivity analysis of factors controlling earth fissures due to excessive groundwater pumping

Aseisimic earth fissures are complex consequences of groundwater withdrawal and natural hydrogeologic conditions. This paper aims to improve the understanding of the mechanism of earth fissuring and investigate the relative importance of various factors to fissure activity, including bedrock geometry, piezometric depletion, compressibility and thickness of the exploited aquifer. For these purposes, a test case characterized by an impermeable and incompressible rock ridge in a subsiding basin is developed, where stress/displacement analyses and fissure state are predicted using an interface-finite element model. Three different methods for global sensitivity analysis are used to quantify the extent of the fissure opening to the aforementioned factors. The conventional sampling based Sobol’ sensitivity analysis is compared to two surrogate based methods, the general polynomial chaos expansion based Sobol’ analysis and a feature importance evaluation of a gradient boosting decision tree model. Numerical results indicate that earth fissure is forming in response to tensile stress accumulation above the ridge associated to pore-pressure depletion, inducing the fissure opening at land surface with further downward propagation. Sensitivity analysis highlights that the geometry of bedrock ridge is the most influential feature. Specifically, the fissure grows more when the ridge is steeper and closer to the land surface. Pore pressure depletion is a secondary feature and required to reach a certain threshold to activate the fissure. As for this specific application, the gradient boosting tree is the most suitable method for its better performance in capturing fissure characteristics.


Introduction
Aquifer over-exploitation has led to land deformation in several semiarid basins worldwide and land subsidence is one of the major impacts on the earth surface. However, in certain cases, the accumulated deformation results in earth fissuring. This geological hazard has caused negative impacts on economic activities, social security, and environment protection, thus raising greater attention in the last decades. So far, land subsidence can be accurately simulated and predicted by numerical models (Janna et al 2012;Teatini et al 2005;Ye et al 2016), whereas the mechanism of earth fissure is more complex and difficult to simulate (Budhu 2011;Hernandez-Marin and Burbey 2010;. Different hydrogeological settings favoring the occurrence of earth fissures have been conceptualized based on field studies, including buried undulating bedrock, pre-existing fault and abrupt heterogeneous thickness of aquifer (Sheng and Helm 1998;Sheng et al 2003). The features of the earth fissures such as density, shape, length, aperture, depth, and dislocation vary greatly in different settings, which also implies different driving mechanisms.
This work aims to improve the understanding of the earth fissure mechanism with the presence of buried bedrock ridges in subsiding basins. Knowledge of the mechanisms driving these hazards may help to predict and therefore limit significant damages to buildings, streets, highways, railroads, earth dams, water wells, and other engineering structures. Note that earth fissures that coincide with fault scarps and abrupt thickness change may also be related to seismicity (Carreón-Freyre et al 2016;Peng et al 2013), but this is beyond the scope of this work.
Modelling the behavior of earth fissure requires a deep understanding of contact mechanism and various numerical methods were developed to delineate the physics of this problem (Hernandez-Marin and Burbey 2010; Liu et al 2019; Wang et al 2015). The FE-IE (finite element-interface element) numerical method developed by Franceschini et al (2016) is a prominent approach which exhibits stable and accurate performances on quantifying fissure characteristics (Franceschini et al 2019;Frigo et al 2019;Ye et al 2018;Li et al 2021). In this study, it is adopted to simulate the fissuring process in a subsiding basin with buried bedrock ridges.
The complexity of these systems typically give rise to many uncertainties due to the geologic configuration, the pore-pressure distribution, the hydro-geomechanical parameters along with the mathematical and numerical approximation of the physical problem (Frigo et al 2019;Sheng et al 2003). In this context, a global sensitivity analysis (GSA) is fundamental to evaluate the susceptibility of input variables to fissure formation and propagation, considering their possible mutual interactions (Iooss and Le Maître 2015;. A variance-based GSA is employed based on the functional decomposition of the output variance, providing the Sobol' indices that quantify the input contribution to the output variance (Sobol' 1993(Sobol' , 2001. First, we compute the indices using an efficient Monte Carlo sampling design, employing the Sobol' sequence to generate a uniformly distributed sample over the uncertain input domain (Sobol' et al 2011). However, a large number of samples are needed, in particular when interaction factors are investigated. This means a computationally prohibitive cost for large scale models, as it is the case of earth fissure modelling. For this reason, the use of surrogate (or proxy) models, which are approximations of the forward model built from a limited number of runs of the full model, is seen as a prominent approach to reduce the overall computational cost of the sensitivity analysis. Among surrogate methods, polynomial chaos expansion (GPC) is a probabilistic method which uses orthogonal polynomial projections of the input random variables to build the stochastic model output (Ghanem and Spanos 1991). This technique provides a straightforward way to derive Sobol' indices from model representation coefficients (Crestaux et al 2009). Thanks to these advantages, GPC surrogates have been recently applied for GSA in environmental modelling (Ciriello et al 2013;Couaillier and Savin 2019;Friedman et al 2021;Sochala and Le Maître 2013;Kaintura et al 2018;Zoccarato et al 2020).
However, difficulties may rise when the quantity of interest presents some discontinuities with respect to the model parameters (Sochala and Le Maître 2013;Le Maître et al 2004). In case of earth fissuring simulation, this occurrence occurs when the discontinuity develops within the continuous porous medium. To overcome this problem, we elected to employ a decision tree-based method such as the gradient boosting tree (GBT) that uses an ensemble of decision trees to approximate the solution, in particular for non-linear models with arbitrary inputs (Friedman 2001;Louppe 2014). Although tree-based models are considered as ''black box'', many interpretation methods, such as Shapley Additive Explanations and Mean Decrease Accuracy (MDA) were designed to assess feature importance according to their relevance for the corresponding estimator, similarly to the key insights of GSA (Arabameri et al 2022;Breiman 2001;Carvalho et al 2019;Lundberg and Lee 2017).
The paper is structured as follows. At the beginning a brief background of the geomechanical modelling approach is provided. Then, the GPC and GBT methods are described with their corresponding importance indices (Sobol' and MDA). The setup of forward model and parameterization of interest are presented in the next section. The results of the numerical simulations and the statistical analyses are then discussed in detail with a list of main conclusions that close the paper.

Numerical model
The numerical model consists of a continuous model and a contact mechanism model, where the former provides the stress field analysis while the latter describes the generation and propagation of fissures. Note that when the fissure location is unknown, the stress field is fundamental to identify the potential location.

Continuum model
Stress and strain fields caused in a 3D continuous porous medium X are quantified by means of the classical poroelasticity theory (Biot 1941). The governing equations read: where r ¼ C : r s u À b p 1 ð Þis the total Cauchy stress tensor, with r s ¼ 1 2 r þ r T ð Þ the symmetric gradient operator, C the rank-4 elasticity tensor, b the Biot coefficient, and 1 the rank-2 identity tensor. The displacement field u is the primary unknown. The pore pressure p is a known forcing term, either imposed according to previous physical knowledge and measurements or provided by a groundwater flow model (Ye et al 2018). Without loosing the general validity of the approach, the constitutive relationship between stresses and strains used in this study is assumed linear elastic. Consequently, the soil compressibility C m and Poisson ratio m are constant and do not vary with the pressure (i.e., stress) change. Moreover, as usually implemented in the geomechanical application related to aquifer over-exploitation (Hernandez-Marin and Burbey 2012;Ye et al 2016;Zhu et al 2020), the small strain hypothesis is adopted. A tetrahedral finite element (FE) discretization is used.
The stress analysis in the continuous model is used to locate the zones where shear and tension accumulate, i.e. they are more prone to fissuring. These are the sites where the earth fissure model is ''inserted'' to check the actual occurrence of discontinuity development and growth.

Earth fissure (EF) model
From a mathematical standpoint, a geological discontinuity such as an earth fissure can be represented as a pair of friction surfaces, possibly in contact with each other, embedded within X. The model must ensure the normal contact constraint, namely the impenetrability of the two portions of the porous body detected by the discontinuity. The discrete fracture model proposed by Karimi-Fard et al (2003) and Garipov et al (2016) is used to describe the contact mechanics. More specifically, we take advantage of the model implementation proposed by Franceschini et al (2016) and Franceschini et al (2019) where the fracture network is discretized by interface elements (IEs), which are zero-thickness FEs with shape functions compatible to those of the surrounding FEs.
The fissure is considered as a boundary C f within X, with a contact condition acting on the opposed surfaces C 1 f and C 2 f that allows for a relative displacement (opening and sliding) between corresponding points whenever the stress state violates a certain failure criterion. In this modeling approach we elect to rely to a failure criterion based on the classical Mohr-Coulomb framework, which imposes the following condition on C 1 f and C 2 f : where t ¼ r Á n is the contact stress, with t T ¼ t À t N n and t N ¼ t Á n the tangential and normal components, respectively. The unit vector n denotes the normal vector for the surface pair C 1 f and C 2 f . In the Coulomb criterion, c and u are the cohesion and friction angle, respectively. The impenetrability of solid bodies is prescribed by the normal contact condition: where g N is the normal component of g, i.e. g N ¼ g Á n, representing the relative displacement between C 1 f and C 2 f . g is defined as: where u is the global displacement in X, as consistently computed through Eq. (1), and uj C i f the restriction to C i f . The application of the friction law (Eq. (2)) and the principle of impenetrability of solid bodies (Eq. (3)) subdivide the inner boundary C f into three portions: 1. f \0 and t N \0: the fissure is in a stick state, i.e. the discontinuity is fully closed and behaves as a part of the continuum; 2. f ¼ 0 and t N \0: the fissure is in a slip state, i.e. a slip displacement is freely allowed at a fixed tangential traction s max ¼ c À t N tanðuÞ; 3. t N ¼ 0: the fissure is in a open state, i.e. both opening and slip displacements are freely allowed with zero traction.
The main challenges to find the solution in terms of u and t is the identification of stick, slip and open portions of the fissure surfaces. While the maximum extent of the fissure is fixed during the discretization phase, corresponding to the whole surface discretized through IEs, the subdivision into the three different states and the corresponding constraints evolve during the simulation. Details about the discretization and solution strategy can be found in Franceschini et al (2016) and Franceschini et al (2019).

Sensitivity analysis
A sensitivity analysis framework is implemented to investigate the most important factors (and their interactions) controlling earth fissuring in subsiding basis. In this section, we first provide the mathematical framework of two different types of surrogate models (GPC and GBT) used to reduce the computational burden by approximating the full forward model. Then, Sobol' indices for sensitivity analysis are introduced with specific reference to their numerical computation based on Sobol' and GPC approaches. For GBT, the mean decrease accuracy metric (MDA) is presented as a measure of the relative factor importance. The notion of partial dependence is also introduced to characterize the dependence of the model response on individual factors.

Generalized polynomial chaos expansion (GPC)
Running the forward geomechanical model multiple times for large and complex systems can be a very demanding task, both in terms of CPU and memory requirements. A GPC approach (Wiener 1938;Xiu 2007) is therefore proposed to approximate the outcome of the deterministic simulator as a function of the uncertain input parameters with the help of polynomials. With such approximation, propagation of the input uncertainties to the model output can be efficiently computed and statistics such as mean, variance, and quantiles can be easily determined.
The main idea of GPC surrogate models is based on using orthogonal polynomial approximations of the random input to project the stochastic model output. In the following, we provide the basic mathematical framework as derived in Xiu (2007). Let us consider the random model output U 2 R written as a function of the random vector Z of n mutually independent random variables Z ¼ ðZ 1 ; . . .; Z n Þ and distribution function F Z ðz 1 ; . . .; z n Þ ¼ PðZ 1 z 1 ; . . .; Z n z n Þ. We are considering a stochastic process in the probability space (X,F ,P) with space of events X, r-algebra F and probability measure P on F , see e.g. Xiu (2010). Z can directly be the vector of the input random variables, or more usually a set of independent random variables, the so called 'germs', by which the input variables can be described.
As usual, the independence assumption implies . . .; n. Since any random variable may be represented as a series of polynomials in uncorrelated and independent Gaussian variables (Wiener 1938) and, in its generalized extension, in non-Gaussian measures, GPC basis functions of a univariate random variable Z i are defined as the polynomials f/ k ðZ i Þg N k¼0 of Nth-degree satisfying the orthogonality conditions: with c s ¼ E½/ 2 s ðZ i Þ the normalization factors, d s;r the Kronecker delta function and R i is the support of Z i . In the multivariate case, the GPC basis functions U a ðZÞ of degree up to N are products of the univariate orthogonal polynomials: where a ¼ ða 1 ; :::; a n Þ 2 N n 0 is a multi-index with jaj ¼ a 1 þ . . . þ a n . The multivariate basis functions are orthogonal polynomials in L 2 dF z , that is, the space of all mean-square integrable functions of Z with respect to the inner product based on the measure dF Z : where R is defined by As a consequence, the class of orthogonal polynomials is selected according to the measure F Z i . In the GPC context, we aim at finding an approximatioñ U GPC;N ðZÞ of the random function UðZÞ 2 R in the N-th degree polynomial space generated by the basis functions U a ðZÞ: where c a are the coefficients of the expansion. For UðZÞ 2 L 2 dF z , the coefficients c a can be computed by definingŨ GPC;N as the orthogonal projection of U onto the polynomial space Z ¼ spanfU a g. By prescribing the orthogonality condition U ÀŨ GPC;N ? spanfU a g: The coefficients c a read: i.e., they can be computed by numerically evaluating the integral of the product of U a and U. The expansion terms of Eq. (10) guarantees the optimal approximation of U in the sense of the norm defined in L 2 dF Z . The coefficients c a of the approximating GPC are numerically computed by a non-intrusive approach, that is without having to touch the finite element computation, and computing the coefficients only from samples z j of the parameters Z and the corresponding Uðz j Þ values. We use a pseudospectral projection, with the integral term approximated by a high-dimensional quadrature rule: with z j and wðz j Þ the q integration nodes and weights, respectively. Since U a is at most of degree N, the integrand function has at most degree 2N. In the univariate case, this requires the use of a ðq ¼ N þ 1Þ-point Gaussian quadrature rule, while in the multivariate case with n random variables the number of points grows up to q ¼ ðN þ 1Þ n . Using this approximation, the surrogate model needs the evaluation of U through the numerical solver of the forward model at the q integration points z j . Another approach to compute the coefficients c a of the expansion is by regression, that is, by minimizing the (unweighted) mean squared L 2 error of the expansion. The sum attains its minimum, where the gradient is zero, that is, where for all jbj\N. The system of equations can be solved for the coefficients c a .

Gradient boosting tree (GBT)
Convergence of the GPC approximation is especially favorable when the dependence of the model output on the given uncertain input parameters have sufficient smoothness. As this is not necessarily the case for the given geomechanical model, we have also tested a gradient boosting tree (GBT) approximation of the model output to explore the relationship between the input parameters and the model output. Given that the model output U is a function of the input variables Z ¼ ðZ 1 ; . . .; Z n Þ, gradient boosting method assumes that the approximationŨ GBT ðZÞ is represented by an ensemble of base learners (e.g., weak basic models) which minimizes the average value of a specified loss function LðUðZÞ;ÛðZÞÞ such that: evaluated at the sample points, whereÛðZÞ is the predicted values of the observed values U and can be written in the form: with h m ðZÞ the base learner at m-th stage of the boosting algorithm characterized by a fixed size of stages M. In particular, GBT uses the decision tree as base learner, thus h m ðZÞ can be written as: where I m refers to the number of leaves at stage m, subscript i is the index for each leaf in the tree, b im is the predicted value of the terminal region R im with 1 R im the indicator function, which takes value 1 if Z lies in the subset R im otherwise takes value 0. Then, a steepest descent step is commonly applied to fit h m ðZÞ to the pseudoresiduals r jm with the training setfðz j ; Uðz j ÞÞg q j¼1 , i.e., intermediate error terms at m-th stage, for j-th sample point ðz j ; r jm Þ: where the mean squared error (MSE) is used as loss function. Afterwards, the expansion coefficient b m can be optimized: Therefore, the model can be updated by: Moreover, regularization methods impose the constrains on fitting procedure to prevent the overfitting, that is when the surrogate model exactly describes the training data and fails to fit unseen data. For example, the maximum stage of gradient boosting M in Eq. (15) is a natural regularization parameter which discourages learning more complex model to avoid overfitting. However, it has been found that regularization through shrinkage provides superior results to that obtained by restricting the maximum stage (Copas 1983), hence a simple shrinkage strategy is added to Eq. (20): Under this form, two regularization parameters are used in the gradient boosting algorithm: the learning rate m and the number of boosting stage M.

Variance-based Sobol' indices
Consider the model under investigation is described as a function U ¼ f ðZÞ, where U is a scalar and the input Z ¼ ðZ 1 ; Z 2 ; . . .Z n Þ is defined over the n-dimensional hypercube I n with mutually independent components. Assuming U to be a square integrable function, the Sobol' functional decomposition scheme reads: where each term is square integrable over I n and mutually orthogonal. The terms in the decomposition may be derived by: and similarly for higher degree terms. In Eq. (23) E Z $ i denotes the expected value over all elements of the parameters Z except the i-th one.
The decomposition allows to attribute the variances to the different parameters or their various degree interactions. These can be given by the partial variances: and similarly for higher order terms. With the help of the partial variances, the total variance can be decomposed: The variance based sensitivity is described by the ratio of the partial variances and the total variance. The first and second-order Sobol' indices are defined as: and similarly for the higher order sensitivity indices. The first order indices fS i g n i¼1 measure the effect on the output variance of factor Z i alone. Higher-order indices represents the combined effect of the group of factors Z 1 ; Z 2 ; . . .; Z n on the variance of the model output.
Another sensitivity measure is the total index of the i-th factor: where Var Z $ i ðE Z i ðUjZ $ i ÞÞ can be regarded as the first order index of Z $ i , so that S Ti measures the contribution to the output variance of all terms which contain factor Z i . These indices can be calculated by Monte Carlo method (Saltelli 2002;. The procedure is as follows. Generate q Â 2n sample matrix of the input random variables Z. The first n columns are gathered as matrix A and the second n columns are used similarly as matrix B. From these two matrices we generate n further q Â n matrices A B i by taking matrix A and replacing its i-th column with the corresponding column of B. The estimators: used in Eqs. (26) and (27) allow to compute the indices S i and S Ti .
In this work, we use the Sobol' sequence, i.e., a quasirandom low discrepancy sequence, to generate the samples z j (Saltelli 2002;. The difference with the ordinary Monte Carlo is that quasi-Monte Carlo substitutes random points with low discrepancy sequences, thus improving the convergence of the estimator. The main problem with this sampling-based method is the cost of computing f ðA B i Þ. Instead, by using the GPC or GBT surrogates as model proxy, the model output can be directly computed from the proxy model in a computationally cheap manner. One of the great advantages of the GPC surrogate model is that the Sobol' indices can be computed analytically without using the sampling based approximation given in Eq. (28). Due to the orthogonality condition, the mean and the total variance of the GPC can be directly computed from the coefficients of the expansion as follows: Varðf According to Sudret (2008), if we introduce the set of a tuples J i 1 ;...;i s such way that only the indices ði 1 ; . . .; i s Þ are nonzero: than J i is defined as a set of all multi-indices that corresponds to the polynomials depending only on parameter Z i . Consequently, the Sobol' decomposition (see Eq. (22)) of the GPC approximation is straightforward: and thus any element of the decomposition can be written as: The partial variances can be also easily computed from: that is, the coefficients corresponding to the polynomials that have dependence only on the selected variables have to be collected, squared, multiplied with its norm and summed up. For the sensitivity index this expression has to be divided by the total variance given in Eq. (30).

Mean decrease accuracy (MDA)
For the GBT surrogate model instead of computing the Sobol' indices with the help of the MC estimation, we use a different sensitivity measure that can be efficiently computed by GBT models. Each feature importance is here evaluated through a permutation-based measure following the idea of Breiman (2001) and the application in Jaxa- Rozen and Kwakkel (2018). Given the q Â n matrix A of the random input variables Z, the MDA index of the i-th feature measures the decrease of the estimator accuracy by randomly permuting the values of Z i (i-th column of input variables matrix) for K times in total and for each repetition re-computing the ensemble tree predictions with the k-th (for k ¼ 1; . . .; K) permuted column A i . The higher the inaccuracy, the most important is the feature for the particular model. MDA of the i-th feature is defined as where s is the reference score and s k;i the score for the k-th permutation of feature Z i , where the score is obtained by computing the mean square error between predictions and observations. The feature is important if permuting its values causes a large drop on the model performance.

Convergence criterion of importance indices
Here we apply the convergence criterion proposed by Roustant et al (2014) to evaluate the stability of the important indices. The vector V q ¼ ðv 1 ; . . .; v n Þ of the variable importance indices is estimated from a sample size of q observation points, where n is the number of input features. Specifically, the Euclidean norm of the vector is taken into account rather than the individual indices so that the more influential indices have more effect on the convergence measurement. The importance indices are computed sequentially over an increasing sample size at intervals of Dq. Then the convergence criterion k q is computed by: where k k is the Euclidean norm and t is the number of total intervals. The values of Dq and t are case-dependent. This criterion will be imposed on the total Sobol' indices S T and MDA.

Partial dependence
Compared to Sobol' indices and MDA, partial dependence is more similar to one-at-a-time (OAT) sensitivity analysis, which assumes the model response is a function of one or two input variables and characterizes the average marginal effect on model prediction (Goldstein et al 2015). Owing to this feature, partial dependence plot can visually depicts the relationship between model prediction and the variables of interest. The partial dependence functionŨ i reads: where z i and Z j are respectively the feature set of interest and complement used in the surrogate modelsŨ. Generally, z i accounts for two components at most. The partial dependence can be computed by a Monte or quasi Monte Carlo method with the help of a surrogate model.

Software availability
In this work, the forward geomechanical models are carried out by the GEPS3D simulator (Isotton et al 2019;Franceschini et al 2016). The reference global sensitivity analysis, that is Sobol' technique, is implemented by the SALib library in Python environment (Herman and Usher 2017). SGLib library (Zander 2020; Friedman and Zander 2020) is used to compute the polynomial chaos expansion and Sobol' indices (Vittek et al 2006). Gradient boosting tree algorithm and mean decrease accuracy computation are carried out by the scikit À learn module in Python with gradient boosting regression estimator and permutation feature importance function (Pedregosa et al 2011).

Model setup and parameterization
The investigated configuration conceptualizes the geological setting in Wuxi, China, where the compressible deposits of the Yangtze River cover an undulating bedrock. The numerical simulation is developed on a quasi-3D domain 2000-m long (x-direction), 50-m thick (y-direction), and 500-m deep (z-direction). A traction-free top surface and a fixed bottom surface are considered (Fig. 1). On the lateral surfaces the horizontal displacements are precluded in the orthogonal direction. The conceptual model is composed of three hydrostratigraphic units: an upper aquitard, a bottom aquifer, and a buried triangular bedrock. For sake of simplicity, each material is assumed to behave elastically with the same Poisson ratio m ¼ 0:25. Cohesion c and friction angle u are set equal to 0.01 MPa and 30 respectively. A piezometric drop linealy varies from 0 to Dp in 10 years and is uniformly assigned to the bottom aquifer, meanwhile the upper aquitard is regarded as an hydraulically ''inactive'' unit where the pore pressure propagation from the underlying sandy layer is negligible. The initial stress field is computed based on the gradient density (r v ¼ 1200 kg/m 2 /m) and the minimum-to-maximum stress ratio reads r h =r v ¼ m=ð1 À mÞ.
The vertical size of tetrahedral FE elements is 10 m and the horizontal dimension is in range between 5.5 and 20 m, slightly varying according to the ridge geometry. Previous studies have proved that earth fissures are most prone to generate from the ground surface above the ridge tip downward, with a stress state characterized by high tension above the ridge (Ye et al 2018;Frigo et al 2019). Moreover, fissures cannot propagate within the bedrock where pressure does not change and the stress field variation is negligible. Therefore, a IE alignment is vertically introduced from the land surface to the ridge tip as highlighted by the white line in Fig. 1. The triangular IE discretization is consistent with FE discretization. Stress distribution and magnitude depend on the ridge geometry, the aquifer thickness, and differential subsidence. This latter is primarily dependent on the pore pressure change and sediment compressibility. Therefore, these four variables, i.e. ridge geometry, aquifer thickness, pressure change and sediment compressibility, are selected as input features for GSA.
Here, the ridge geometry is characterized by the slope of the bedrock ridge (tan h). Note that the length of ridge basement is fixed at 500 m. The fraction of the aquifer thickness over the domain thickness (500 m) is denoted f.  Table 1. A uniform probability distribution is assumed for each variable.

Deterministic model run
We randomly choose one experiment designed for GSA to present the numerical outcomes. The simulated temporal evolution of tensile stress on a vertical section of the domain is shown in Fig. 2. Tensile stress r h initially accumulates around the ridge tip. As piezometric level declines, a tensile zone also occurs at the land surface above the apex of ridge. Once tensile stress excesses the tensile strength, i.e. t N ¼ 0 (see Sect. 2.2), IEs change from a stick to an open state with the discontinuity that develops at the land surface and propagates downwards. Simultaneously, tensile stress dissipates due to the crack opening. Notice that the porous medium directly above the ridge tip does not experience any shear stress due to the symmetric configuration. Therefore, only fissure opening develops with this setting. Figure 3 shows the evolution of the earth fissure as provided by the IE model. Fissure initially originates at the land surface where the tensile strength is the lowest and later develops at depth too, just above the ridge tip, in response to the concentration of tensile stress. The upper fissure which initially is very narrow, keeps enlarging horizontally and extending downwards as the aquifer pressure continues to decrease. At the 10th year, i.e. in correspondence of a piezometric decline in the confined aquifer equal to 1 MPa, the fissure reaches a depth about 30 m with a maximum opening equal to 1.7 m at the land surface and tapers with depth. Conversely, the bottom discontinuity remains confined at the depth without a significant development.
Generally, the energy is dissipated after fissuring mainly at land surface. Moreover, the overburden stress due to sediment load usually limits tensile fissuring at depth (Budhu and Shelke 2008). Therefore, the bottom activated zone is not included in the quantity of interest d act that is defined as the fissure depth from the land suface. It is worth mentioning that the size of activated depth is controlled by the vertical length of the IE alignment, thus the relative activated depth d r;act ¼ d act =ð500 À h r Þ is introduced to have comparable results when varying the model geometry (Fig. 1). In this case, d r;act equals 0.132. tan h 5:0 Â 10 À 1 1 :9 Â 10 À 0 f 4:0 Â 10 À 1 9 :0 Â 10 À 1 C m (MPa À1 Þ 5:0 Â 10 À 3 5 :0 Â 10 À 2 Dp (MPa) À1:0 Â 10 À 0 0 :0 Â 10 À 0 Fig. 2 Sequential evolution of the dimensionless horizontal stress r Ã h ¼ r h =Dp at the 5th, 8th and 10th years as simulated with the EF model. The results are obtained using tan h ¼ 1:2, f ¼ 0:65, C m ¼ 0:05 MPa À1 and Dp ¼ À0:89 MPa

GPC surrogate
The computation of the GPC coefficients has been initially carried out by the pseudospectral approach (Eq. (11)) and increasing step by step the GPC degree of the polynomial response. The surrogate modelŨ GPC is developed to approximate d r;act with input parameters Z ¼ ftan h; f; C m ; Dpg. The validation of the fitted surrogate model is carried out by employing 7000 samples, that is the available set of points used to compute Sobol' indices with the quasi Monte Carlo approach. Moreover, train and validation of the GPC surrogate is also obtained from the 7000 points using 80% for regression and the remaining 20% for validation.
The results are shown in Table 2 with the coefficient of determination R 2 used to assess the fit goodness of the surrogate model to the full problem and computed by means of the leave-one-out cross-validation. Increasing the GPC degree both approaches, i.e. pseudospectral and regression, provide similar results with increasing values of the coefficient of determination R 2 . Obviously, the computational cost of regression is much higher, in particular at low GPC degrees. The maximum R 2 is close to 0.80. A visual comparison of the full model results and the surrogate solution is shown in Fig. 4a. The higher discrepancy is obtained at the boundary of the solution where the surrogate solution provides results larger than 1.0 or lower than 0.0. These solutions correspond to the nonphysical response meaning (i) a fissure reaches and propagates within bedrock (d r;act [ 1.0) and (ii) a negative opening (d r;act \ 0.0), representing non-penetration that is not admitted by the model hypothesis (see Eq. (3)).

GBT surrogate
Gradient boosting algorithm is also implemented with increasing size of input data to check the convergence, thereof 80% is used to fit model with remaining 20% for validation. Hyperparameter tuning is carried out by a Grid Search method which enumerates all the possible combinations of hyperparameters and gets optimal values based on the corresponding coefficient of determination R 2 GB . In this application, only the learning rate m has been tuned given it's the most important hyperparameter for GBT estimator (Probst et al 2019). Table 3 shows the model goodness that stabilizes when the data size reaches 5000. In addition to Sobol' samples, the Gauss points (4096) for the GPC method are also used to validate the regression tree obtained from maximum sample size (7000) (Fig. 4b). The regression tree also has some nonphysical predictions (d r;act [ 1.0), however the absolute values of discrepancy are much less than that of GPC solutions. Moreover, R 2 GB suggests GBT algorithm outperforms GPC algorithm with respect to the prediction accuracy (R 2 GB =0.96 vs R 2 RG =0.79). Nevertheless, both two surrogate models fail to capture the characteristic that the fissure opening depth d r;act keeps constant within some q sub-domains irrespective of parameters variation.

Importance metrics
Sobol' technique is chosen as the ''reference'' sensitivity analysis with the input samples generated by the Sobol' sequence. As mentioned in Sect. 5.2, these samples are reused to train and validate GPC and GBT surrogates. Knowing that the reliability of sensitivity measures obtained by metamodels are dependent on their predictive power, we can assure GPC and GBT are good surrogates and the ''overfitting'' problem can be excluded according to the R 2 values obtained from the cross validation. Figure 5 shows that Sobol' and GPC methods estimate similar results of first order indices S i and total Sobol' indices (S Ti ), with the GPC algorithm showing a quicker convergence with respect to the quasi Monte Carlo method. For each variable, S i accounts for the larger proportion of the corresponding S Ti , indicating a minor contribution from interactions between the i-th variable and the other input factors. The second order indices from Sobol' technique and GPC, both computed with 7000 samples, are listed in Table 4. It is evident the small interaction between input factors. Note that Sobol' technique gives some negative indices for the non-influential terms indicating some computation errors which could not be eliminated with the current sample size (Herman and Usher 2017). The coefficient of determination R 2 is used to asses the goodness of fit. R 2 PS refers to GPC coefficients computed using the pseudospectral approach through Gauss quadrature, whereas R 2 RG is obtained by regression. The number of points indicated are those used to train the surrogate models Stochastic Environmental Research and Risk Assessment Figure 6a depicts MDA importance indices obtained from the GBT model with default repetition K ¼ 100. A comparison with Sobol' method measures is presented in Fig. 6b. Note that Sobol' indices and MDA importance indices measure different quantities, thus a min-max scaling for each value is employed for direct comparison of the indices. According to Eq. (36), both GPC and GBT surrogate models reach the convergence criteria k q \0:05 at q ¼ 7000 (with Dq ¼ 2000 samples and t ¼ 3 intervals). Conversely, the reference Sobol' method fails to converge at 7000 sample size which proves surrogate models can reduce the overall computational cost of analysis with respect to Sobol' method. Moreover, GBT not only ranks the variables in the same way as Sobol' method but also provides basically identical proportional indices with respect to the total effect. The importance metrics obtained from the three methods highlight the ridge geometry is the most influential variable for the fissure opening, with the pressure variation also having a non-negligible impact on the fissure development. The contributions from the other two variables are smaller.

Partial dependence
We also employ partial dependence to investigate the surrogate model response to the variable changes (Fig. 7). A number of 50 samples from the validation set are used to illustrate how the model prediction to one variable changes, keeping fixed the other features. Note that each sample is represented by one thin line. The thicker lines represent the partial dependence calculated from the whole validation set (20% of 7000 samples). Although there are some Fig. 4 Visual comparison between the full model run and the surrogate solutions. a GPC model with regression, the blue shaded areas imply nonphysical solutions, i.e., d r;act larger than 1.0 representing a fissure thet extends within the buried bedrock, and lower than 0 indicating interpenetration of solid bodies. b GBT model, where the predictions are basically within the rational range discrepancies between GPC and GBT models results, the trend of partial dependence for each variable is similar. The impact is limited when tan h\1, however, once exceeding 1.47, the relative activated depth boosts significantly. The model response remains almost constant until f [ 0:8 when the average line slope that abruptly increases, although the contribution of thicker aquifers is still limited. According to the gradient variation, d r;act is more sensitive with C m up to 0.02 MPa À1 and a larger compressibility does not favor a much larger fissure development. Dp causes a relative larger output variation with respect to f and C m , which is consistent with the variable importance ranking. The slope of partial dependence gently decreases when the absolute value of Dp reduces.  Sobol' method (top panels) and GPC (bottom panels) method. Left panels present the convergence of Sobol' total indices S T;i , with the shaded areas in a representing the 95% confidence intervals of the indices. The right panels shows the relationships between first order and total indices Based on these analyses, we conclude that the ridge geometry and the pressure variation are the first and secondary variables influencing fissure generation and propagation. Therefore, we plot the partial dependence of these two variables as shown in Fig. 8. The outcomes of two surrogate models are mainly consistent with respect to the model output distribution. In general, the size of ridge slope controls the upper limit of fissure opening. While a certain amount of Dp is necessary for the fissure occurrence.

Discussion
Earth fissuring accompanying differential subsidence above buried bedrock ridges is becoming a worldwide hazard. Since 1950s, this typical fissure occurrence has been reported, for example, in Casa Grande in Arizona, USA (Jachens and Holzer 1979), Yangzi Delta in China (Wang et al 2010), Najran Basin in Saudi Arabia (Youssef et al 2014). These studies have pointed out that fissure formation is induced by groundwater depletion and buried geological structures, but the undergoing physical process is not well known owing to little information and limited modelling technique.
The general consensus is that the pore pressure depletion causes a variation of the in-situ stress field, which is responsible for aseismic formation of earth fissures. Opening and sliding are induced by tensile and shearing, respectively (Hernandez-Marin and Burbey 2010; Budhu 2011). Here, we limit the investigation on the depth of the fissure opening that occurs when the stress normal component becomes greater than zero. This causes the vanishing of contact between the pair of surfaces constituting the IEs inserted above the ridge tip. However, we are aware of the possible formation of fissures due to sliding condition only, such as in the case addressed by Li et al (2021), where multi-fissure formation is simulated for the hydrogeologic setting at Guangming village, Wuxi, China. These are more complex cases, where discontinuities develop where the stress field reaches the yield surface (Eq. (2)), and require an appropriate analyses of the stress field in the continuous body prior to insert the IEs in the most critical zones of the 3D mesh. Notice that in regions with bare land surface, e.g., in Arizona (Cook 2011(Cook , 2013, initial (thin) fissures caused by stress accumulation can be enlarged at the land surface by erosion and collapses due to surface flow during significant rainfall events. These processes are not accounted for in our modelling approach.
With more fissure appearances over the last decades, researches started focusing on the quantitative analyses of the fissure formation mechanism. Sheng et al (2003) defined the ratio of tensile stress over tensile strength as an indication for fissure inception and carried out an one-at-atime sensitivity analysis which suggested the confining stress and, secondarily, the depth of aquifer as the key parameters for fissure formation. Unfortunately, the impact of the ridge geometry was not taken account into this analysis. Frigo et al (2019) applied a multivariate regression to fit the depth of fissure opening as function of the pressure variation and the ratio between exploited aquifer thickness and ridge tip depth. The regression surface consists of a pair of planes with discontinuous joint, highlighting that fissure is more prone to occur when the depth of the ridge tip is shallow. It is also found that pressure depletion plays an important role by controlling the differential subsidence. However, multivariate analysis standardizes regression coefficients as direct measures of sensitivity, which is more suitable for linear problems. Moreover, the number of evaluated variables was restricted to two in order to derive a regression surface in a 3D setting. Thus, the ridge depth and aquifer geometry were combined in a single parameter. In this work, to systematically investigate the model sensitivity to input parameters and model geometry, we perform a global sensitivity analysis using a variance-based approach. Sobol' and total effects indices result in ranking the input factors, with a priority of importance assigned to the geometry of the ridge and the pore pressure drop of the system. The aquifer thickness and the compressibility are less influential with respect to the increase of the probability of fissure opening. The interactions between factors is one order of magnitude lower than the main indices, indicating a second order effect on the output variation. Results are mostly according to the mentioned previous studies, except for the less importance assigned to the aquifer thickness. However, this parameter was considered in a combined form with the ridge depth in Frigo et al (2019), probably causing an overestimation effect. Another reason may be attributable to the selection of the bounds values of the uniform distributions from which each parameter is sampled (Wagener and Pianosi 2019). For example, the slope of the bedrock ridge (tan h) is assumed variable between 0.5 and 1.9 due to some model grid constrains, discarding all h values lower than $ 27 . This may result in a not sufficiently wide choice of the parameter space, cutting out the possible influence of the other parameters at lower h values. We also point out that further analyses are needed to consider the possibility that variance is not a good measure of the output uncertainty, for example using indices that consider moment independence (Borgonovo 2007;Pianosi and Wagener 2015;Dell'Oca et al 2017).
We advocate the use of surrogate models to reduce the computational cost of the generation of the output samples for the computation of the importance metrics . Surrogates based on GPC techniques are prominent because the easy derivation of the Sobol' indices at no additional computational burden. However, it is observed that increasing the level of problem non linearity, e.g., in proximity of the fissure opening, the method fails to provide a good model proxy (e.g., the predicted d r;act [ 1, see Fig. 4a). For this reason, we also employed the GBT algorithm for the estimation of total sensitivity measures, i.e., the mean decrease accuracy estimates MDA. Compared to Sobol' indices, MDA importance lacks of a straightforward interpretation, as they are computed based on the model prediction accuracy rather than the effects on the output variance. Thus, MDA is limited to assess the interaction effects between the input variables. However, this limit can be compensated by using other interpretability methods like SHAP (Lundberg and Lee 2017) or by using them as interpretation of the Sobol' total effects. The counterpart of this methodology, which seems more suitable for applications on discontinuous problems compared to GPC, is that GBT regression tree spends more time on tuning hyperparameters to optimize the model performance, increasing the overall computational burden.

Conclusion
The present work investigates the relative importance of various hydrogeologic features to the formation and propagation of aseismic fissures above the crest of buried bedrock. The conceptual model used for numerical simulation is idealised and simplified from the field case reported in Wuxi, China. Earth fissures develop only with the simultaneous occurrence of an undulating bedrock that intercept a thick compressible aquifer where pressure decline takes place.
An advanced geomechanical simulator is used to analyse the stress field and quantify the fissure opening. The numerical simulations show how the pore pressure depletion results in the accumulation of tensile stress above the ridge tip, favoring the development of an earth fissure at the land surface. The fissure deepens as the pressure decline increases.
The numerical results are processed by GSA to assess the variable (bedrock ridge geometry, aquifer thickness and compressibility, and pore pressure variation) importance to fissure activation and propagation. Sobol' provided with Monte Carlo approach are taken as the reference and compared to Sobol' indices derived from surrogates of the forward model. GPC and GBT algorithms are applied to fit the numerical solution and then estimate the factor importance based on surrogate model prediction. The following main conclusions can be drawn: -The three methods, i.e. Sobol', GPC, and GBT, rank the four variables consistently and provide similar importance measurements, thus supporting the validity of the achievement; -The aquifer thickness and compressibility are lessinfluential variables to fissure opening; -Marginal effect and surface response plots highlight that the probability of significant fissuring (deep and with large opening) is higher when the buried ridge is steeper and its tip closer to the land surface, with sufficiently large pore pressure depletion.
Finally, we have assessed the computational performances of three techniques on this application. Sobol' technique requires a larger sample size to converge, which makes it computationally expensive. Compared to GPC-based model, GBT performs better on approximating the discontinuous solution but requires a larger computational burden.

Declarations
Conflict of interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons. org/licenses/by/4.0/.