1 Introduction

Rapid drawdown is one of the major causes of failure of earth slopes subjected to changing river levels. The decrease in the water level reduces the stabilizing effect of the hydrostatic pressure on the slope. With rapid drawdown, the internal pore water pressure (PWP) distribution of a fine-grained slope reflects the initial water level for some time. Three approaches have been proposed to investigate this scenario: undrained analyses, flow methods and coupled flow deformation analyses. The first group is suitable for impervious materials and neglects the water flow since the dissipation of PWP is much slower than the decrease in the water level. Morgenstern [34] adopted this approach and provided stability charts to estimate the factor of safety during rapid drawdown. Lane and Griffiths [28] developed a chart-based strategy to achieve safe drawdown with 2D Finite Element (FE) undrained analysis. The second approach, which is appropriate for permeable materials, solves the flow problem caused by the change of hydraulic boundary conditions with time. The underlying assumption is that the solid skeleton is rigid, i.e., soil deformations are neglected. A fully coupled flow deformation analysis was first proposed by Pinyol et al. [42]. The authors highlighted the discrepancies between the aforementioned approaches: the undrained analysis leads to conservative and unrealistic results while pure flow analyses underestimate the PWP. The authors validated the coupled hydro-mechanical method by comparing the calculated PWP with piezometer measurements from the Canelles landslide in Spain [43].

These methods address rapid drawdown with a deterministic approach, which uses the definition of the factor of safety (FOS) to assess slope stability. In the last 20 years, there has been growing interest in probabilistic methods for geotechnical engineering, which incorporate the inherent uncertainties of soil properties in the analysis and design of geotechnical structures. In broad terms, reliability-based design (RBD) approaches describe the performance of a system as a function of stochastic variables. RBD methods can be classified based on different levels of sophistication: semi-probabilistic (Level I), approximate probabilistic methods (Level II) and fully probabilistic methods (Level III). The first level is already implemented in Eurocode 0 [5] with the introduction of partial safety factors, which are calibrated so that the required probability that a limit state will not be exceeded is satisfied. Levels II and III estimate the probability of failure (\(p_{\rm f}\)) with analytical and numerical methods, respectively.

The first attempts to assess slope stability within a probabilistic framework [1, 7, 31] are based on the first and second-order reliability method (FORM and SORM, Level II) combined with limit equilibrium methods (LEM). The limit state function is defined as \(G\left( \varvec{x}\right) = {\rm FOS}\left( \varvec{x}\right) -1\), where \(\varvec{x}\) represents the stochastic variables. In FORM, the limit state function is approximated by the first-order Taylor series expansion (in SORM the Taylor series is expanded to the second term) and the reliability is measured, according to the Hasofer–Lind method, by the minimum distance between the origin of a standardized coordinate system (e.g., \(X_i^{\prime } = \frac{X_i - \mu _{X_i}}{\sigma _{X_i}}\)) and the limit state \(G\left( \varvec{x}\right) = 0\). The probability of failure can be obtained as \(p_{\rm f} = \phi \left( - \beta \right)\). The main drawback of these methods lies in the approximate estimation of \(p_{\rm f}\). Alternatively, fully probabilistic methods (Level III) can be employed for an accurate estimation of \(p_{\rm f}\), but at the expense of high computational effort. For instance, a direct Monte Carlo simulation requires approximately one million trials to estimate a value of \(p_{\rm f}\) equal to \(10^{-4}\) (which most closely corresponds to the target reliability of 3.8 in a 50 years reference period prescribed in Eurocode 0) with a coefficient of variation (COV) of 10% [15]. Conducting this number of numerical simulations is time-prohibitive for complex geotechnical analyses. A few studies have attempted to perform reliability analysis to real case studies of reservoir slopes with Monte Carlo simulations [30, 53, 56]. The estimation of geomechanical parameters in terms of statistical moments and distribution certainly represents one of the greatest challenges in reliability analysis. The other limitation is the computational cost. Two strategies are available to increase the performance of Monte Carlo simulations. The first reduces the sample size with sampling techniques such as latin hypercube sampling (LHS) [48], Importance Sampling and adaptive simulation techniques, e.g., subset simulation [19, 38]. A second way to achieve higher computational efficiency is to approximate the limit state function with a surrogate model or response surface method (RSM). A simple mathematical model or a more complex soft computing algorithm can be applied to ascertain the relationship between the stochastic parameters and the output of the limit state function or the FOS. The following techniques have been extensively explored in geotechnical reliability analysis: Hermite polynomial chaos expansion (HPCE) [20,21,22], Kriging-based RSM [55], support vector machines (SVM) [23, 25, 26], Gaussian processes (GP) [24], extreme gradient boosting (XGBoost) [52], artificial neural networks (ANN) [49] and multivariate adaptive regression splines (MARS) [54].

Recently, RBD methods are increasingly applied to slope stability problems due to the augmented computational efficiency that can be reached with surrogate models. Jiang et al. [20] proposed a non-intrusive stochastic FE method based on HPCE to perform slope stability analysis in spatially variable soils. The authors applied this method to estimate the probability of failure of two illustrative examples, namely an undrained clay slope and a \(c-\varphi\) slope. SVM was successfully employed together with Monte Carlo simulations to solve slope reliability analysis in the following studies [25, 29]. Optimization methods like particle swarm optimization (PSO) and artificial bee colony (ABC) have been integrated with SVM to enhance the accuracy of their predictions [23, 26]. Gao et al. [10] performed a reliability analysis of a slope under rapid drawdown with FORM. Based on the numerical results obtained with the finite difference method (FDM), the authors applied a polynomial regression model to estimate FOS from the following input parameters: slope angle, reservoir water level, hydraulic conductivity and drawdown rate. The last two parameters were considered as stochastic and the impact of their uncertainty on the reliability was investigated. A sensitivity analysis proved that the hydraulic conductivity affects slope reliability with the highest impact. Wang et al. [51] proposed a surrogate model based on MARS to evaluate the probability of failure of an earth dam under transient seepage and compared different scenarios of rising upstream water level velocity. Jiang et al. [22] conducted a reliability analysis of an embankment under unsaturated seepage considering the spatial variability of shear strength and hydraulic properties. The authors developed a surrogate model based on the HPCE and conducted a sensitivity analysis to study the impact of the variability of each random variable on \(p_{\rm f}\). Wang et al. [52] used the extreme gradient boosting (XGBoost) algorithm to perform reliability analysis of an earth dam. Both machine learning approaches for regression and classification were explored to assess slope stability and estimate \(p_{\rm f}\).

In this study, an intelligent surrogate model is constructed to estimate the probability of failure of a reservoir slope over time under rapid drawdown with the random variable method. The uncertainties of the shear strength and hydraulic parameters, including the curve fitting parameters of the soil-water characteristic curve (SWCC), are modeled with a lognormal distribution. The surrogate model is trained on an initial sample of data generated with LHS. The problem is studied in the time frame after rapid drawdown and the surrogate model is fed with the results obtained at different time steps of the analysis. The performance of two machine learning models, i.e., SVM and XGBoost, is evaluated and compared. The surrogate model with the best performance in terms of prediction accuracy and computational efficiency is selected to estimate \(p_{\rm f}\) with time. Section 2 presents the methodology used, namely the deterministic model, the sampling scheme and the surrogate models that are tested. Section 3 presents the results obtained for different scenarios of rapid drawdown. Section 4 includes a sensitivity analysis of the stochastic parameters.

2 Methodology

First, the problem is studied with a deterministic approach, which consists of an unsaturated transient seepage analysis and a slope stability limit equilibrium analysis based on the Morgenstern–Price method. The modules SEEP/W and SLOPE/W available in the commercial software GeoStudio [32] are used to conduct the analyses. A homogeneous slope with a simplified geometry is considered. The slope is 10 m high and has a slope ratio of 2:1 (Fig. 1). The initial condition is a steady state seepage with the reservoir level at 9 m above the foundation (19 m of elevation in Fig. 1). Then, a constant total head of 19 m is applied on the right side of the slope, while a linear function of the total head with time is applied on the left side of the slope (including the horizontal boundary of the foundation). This function represents the linear decrease in the reservoir level with time from 19 to 10 m, where the slope of the function is the drawdown rate indicated for each scenario. A mesh refinement analysis has been conducted to select a suitable mesh size. The results of deterministic simulations with the mean values of the soil parameters provided in Table 1 and with increasing mesh size have been compared. It was found that a mesh size of one meter is appropriate for the problem considered in this study. Table 1 shows the soil properties of the slope, which are the same of the foundation.

Fig. 1
figure 1

Geometry of the slope and mesh of the seepage model

2.1 Transient seepage analysis

In the seepage analysis, only the soil above the foundation (above 10 m elevation) is considered partially saturated, while for the underlying part only a fully saturated analysis is performed. The governing equation for two-dimensional seepage through unsaturated soil was derived by Richards [44], who extended Darcy’s law to unsaturated flow, and can be expressed as:

$$\begin{aligned} \frac{\partial }{\partial x}\left( k_x \frac{\partial h}{\partial x} \right) + \frac{\partial }{\partial y}\left( k_y \frac{\partial h}{\partial y} \right) + Q = \frac{\partial \theta }{\partial t} \end{aligned}$$
(1)

where \(k_{x}\) and \(k_{y}\) are the horizontal and vertical hydraulic conductivity (i.e., in the x and y direction, respectively), h is the total head, Q is an external applied boundary flux and \(\theta\) is the volumetric water content. In unsaturated soils, \(\theta\) varies depending on the matric suction \(\psi = u_{\rm a} - u_{\rm w}\), which is the difference between the air and the water pressure. This relationship is described by the soil–water characteristic curve (SWCC). In this study, the model proposed by Van Van Genuchten [11] has been adopted:

$$\begin{aligned} \theta = \theta _{\rm r} +\frac{\theta _{\rm s} - \theta _{\rm r}}{ \left[ 1 + \left( \frac{\psi }{a} \right) ^{n}\right] ^{m}} \end{aligned}$$
(2)

where \(\theta _{\rm r}\) and \(\theta _{\rm s}\) are the residual and saturated water content, respectively, and a, n and m (with \(m=1-\frac{1}{n}\)) are curve fitting parameters. The hydraulic conductivity, which also depends on the volumetric water content, was derived by van Genuchten based on Mualem’s model [36]:

$$\begin{aligned} k_{\rm r} = \frac{k}{k_{\rm s}} = S_{\rm e}^{1/2}\left[ 1 - \left( 1 - S_{\rm e}^{1/m} \right) ^m \right] ^2 \end{aligned}$$
(3)

where \(k_{\rm r}\) is the relative hydraulic conductivity, defined as the ratio between the unsaturated and the saturated hydraulic conductivity (k and \(k_{\rm s}\), respectively). \(S_{\rm e}\) is the effective degree of saturation and is equal to:

$$\begin{aligned} S_{\rm e}(\psi ) = \frac{\theta (\psi ) - \theta _{\rm r}}{\theta _{\rm s} - \theta _{\rm r}}\,. \end{aligned}$$
(4)
Fig. 2
figure 2

Hydraulic parameters: a SWCC; b unsaturated hydraulic conductivity curve

Figure 2 shows the SWCC and the unsaturated hydraulic conductivity curves obtained with the hydraulic parameters of Table 1. These equations are solved numerically with the finite element method implemented in SEEP/W. The resulting PWP and \(S_{\rm e}\) are the inputs for the slope stability analysis.

2.2 Slope stability analysis

The factor of safety is defined as the ratio between the resistive and the active forces acting on a slip surface. The most critical slip surface that can develop within the slope will result in the minimum FOS. In this study, the FOS is calculated with the method of slices from Morgenstern–Price. The shear strength of the unsaturated soil is predicted with the extended Mohr–Coulomb failure criterion derived by Vanapalli et al. [47]:

$$\begin{aligned} \tau = c' + \left( \sigma - u_{\rm a} \right) \tan \varphi ' + \left( u_{\rm a} - u_{\rm w} \right) S_{\rm e} \tan \varphi ' \end{aligned}$$
(5)

where \(\tau\) is the shear strength, \(\sigma\) is the total normal stress, \((u_{\rm a} - u_{\rm w})\) is the matric suction \(\psi\) and \(c'\) and \(\varphi '\) are the effective cohesion and the effective friction angle, respectively. This set of analyses, i.e., transient seepage and slope stability, is repeated for a number of simulations equal to the size of the sample for each random variable.

2.3 Estimation of the probability of failure

To reduce the number of simulations, the samples of the random variables are generated with an extension of the latin hypercube sampling (LHS) with the multidimensional uniformity method [8] implemented in Python [35]. Compared to the direct Monte Carlo method, which generates pseudo-random numbers, LHS divides the range of each stochastic variable into several intervals with equal probability. This results in a better distribution of the samples in the sample space. The correlation between random variables is introduced using the methodology proposed by Iman and Conover [17]. Correlation between independent stochastic variables is induced by varying the sample ordering in each marginal distribution, while keeping the representative values untouched.

A batch script written in Python selects each combination of the generated samples and replaces the values of the random variables directly in the GeoStudio files. The batch script starts each deterministic simulation described in Sects. 2.1 and 2.2 and then reads the factor of safety from the results of each simulation. This procedure follows the nonintrusive calculation proposed by Wang et al. [50]. Finally, the probability of failure is calculated as following:

$$\begin{aligned} p_{\rm f} = \frac{1}{N}\sum _{i=1}^{N} I\left[ {\rm FOS}<1 \right] \end{aligned}$$
(6)

where N is the sample size and \(I[\cdot ] = 1\) is an indicator function of slope failure (when failure occurs it is equal to unity).

2.4 Surrogate models

An intelligent surrogate model is adopted to enhance the computational efficiency while estimating \(p_{\rm f}\). A limited number of simulations serve as the training set, from which the model learns the underlying relationship between input parameters (e.g., shear strength and hydraulic parameters) and the output, which is the factor of safety. The problem is framed as regression, which means that a continuous value of FOS is predicted. Two predictive models from machine learning (ML) are considered in this study: support vector regression (SVR) and extreme gradient boosting (XGBoost). SVR presents several advantages. First, it has a simple implementation and a good generalization capability. Second, the model complexity of the model does not increase with the number of input variables [3]. Besides, XGBoost has high computational speed and can handle large datasets as well as missing values.

2.4.1 Support vector regression

SVR is an extension of support vector machines (SVMs) for regression problems [9]. The basic idea of SVM is that the original data are mapped into a high-dimensional feature space by a function, which can be a nonlinear transformation. A hyperplane is searched to separate the data for classification or to find the narrowest tube that contains the majority of the data for regression. The regression function can be defined as:

$$\begin{aligned} f(\varvec{x}) = \varvec{w}^{T} \phi (\varvec{x}) + b \end{aligned}$$
(7)

where \(\phi\) is the nonlinear mapping function, \(\varvec{w}\) and b are the weight vector and the offset, respectively. These parameters can be found by minimizing an objective function subject to some constraints, which aim to limit the discrepancy between the predicted and the measured output to a threshold value \(\epsilon\). A soft margin approach can be adopted to allow a number of data to be outside of the \(\epsilon\)-tube by introducing slack variables. The objective function can be written as:

$$\begin{aligned} {\mathcal {L}}_{obj} = \frac{1}{2} \varvec{w}^{T} \varvec{w} + C \sum _{i=1}^{n} \left( \xi _i + \xi _i^{\star } \right) \end{aligned}$$
(8)

where the first term represents the complexity of the model, C is a penalty parameter and \(\xi\) and \(\xi ^{\star }\) are the slack variables. To estimate these parameters, the objective function is minimized subject to the following constraints:

$$\begin{aligned} {\left\{ \begin{array}{ll} y_i - \varvec{w}^{T} \phi (\varvec{x_i}) - b \leqslant \epsilon + \xi _i \\ \varvec{w}^{T} \phi (\varvec{x_i}) + b - y_i \leqslant \epsilon + \xi _i^{\star } \\ \xi _i, \xi _i^{\star } \ge 0, i=1,\ldots ,n \end{array}\right. } \end{aligned}$$
(9)

As mentioned, the function f(x) can be also nonlinear and in this case, data can be mapped into a higher dimension space, called kernel space, where all the data can be linearly separated. All instances \(x_i\) are replaced with a kernel function \(K(x_i,x_j)\), which is, in this study, the radial basis function (RBF):

$$\begin{aligned} K(x_i,x_j) = \exp {\left( -\gamma \left\| x_i - x_j \right\| ^2 \right) } \end{aligned}$$
(10)

where \(\gamma\) is a hyperparameter that can be manually tuned. Hyperparameter tuning is then applied to search for the optimum values of the hyperparameters C and \(\gamma\).

2.4.2 Extreme gradient boosting

The XGBoost algorithm is an ensemble ML algorithm based on gradient boosted decision trees. It can be used both for regression and classification problems. This model, which was developed by Chen and Guestrin [6], recently became very popular for winning several ML competitions [4] because of its high execution speed and model performance. Ensemble decision trees are the base learners that predict the output (which is in this study FOS) by means of K additive functions:

$$\begin{aligned} \hat{y_i} = \sum _{k=1}^{K} f_k(\varvec{x_i}),\quad f_k \in {\mathcal {F}} \end{aligned}$$
(11)

where \(f_k\) is the function describing one independent tree, \({\mathcal {F}}\) is the space of all possible regression trees, \(\varvec{x_i}\) are the input features and \(\hat{y_i}\) are the predicted output. In supervised learning, the training phase serves to fit a model that relates the output to the input features with the best fit. The goodness of fitting can be measured with an objective function that consists of two terms: the training loss (e.g., the mean squared error) and the regularization term, which controls model complexity (hinders overfitting), as described in Eq. (12) [6].

$$\begin{aligned} {\mathcal {L}}_{obj} = \sum _{i=1}^{n} l(y_i, \hat{y_i}) + \sum _{k=1}^{K} \Omega (f_k) \end{aligned}$$
(12)

The regularization term is defined as:

$$\begin{aligned} \Omega (f) = \gamma T + \frac{1}{2} \lambda \sum _{j=1}^{T} w_j^2 \end{aligned}$$
(13)

where w represents the score on a leaf and T is the number of leaves in the tree [6]. An additive strategy is applied to train the model; each new tree learns from the previous one and the prediction value at the time step t would be:

$$\begin{aligned} \hat{y_i}^{(t)} = \sum _{k=1}^{t} f_k(\varvec{x_i}) = \hat{y_i}^{(t-1)} + f_t(\varvec{x_i}) \end{aligned}$$
(14)

The objective function at the time step t is defined as:

$$\begin{aligned} {\mathcal {L}}_{obj}^{(t)} = \sum _{i=1}^{n} l(y_i, \hat{y_i}^{t-1} + f_t(\varvec{x_i})) + \Omega (f_t) \end{aligned}$$
(15)

The loss function can be approximated to the second-order Taylor series expansion and the objective function can be rewritten:

$$\begin{aligned} {\mathcal {L}}_{obj}^{(t)} = \sum _{i=1}^{n}\left [l(y_i, \hat{y_i}^{(t-1)}) + g_i f_t(\varvec{x_i}) + \frac{1}{2} h_i f_t^2(\varvec{x_i})\right] + \Omega (f_t) \end{aligned}$$
(16)

where \(g_i = \partial _{{\hat{y}}^{(t)-1)}} l(y_i, \hat{y_i}^{(t-1)})\) and \(h_i = \partial ^2_{{\hat{y}}^{(t-1)}} l(y_i, \hat{y_i}^{(t-1)})\) are the partial derivatives of the first and second order of the loss function. The final form of the optimization function can be obtained by removing all the constant terms:

$$\begin{aligned} {\mathcal {L}}_{obj}^{(t)} = \sum _{i=1}^{n} [g_i f_t(\varvec{x_i}) + \frac{1}{2} h_i f_t^2(\varvec{x_i})] + \Omega (f_t) \end{aligned}$$
(17)

2.4.3 Preprocessing

The available dataset is divided into training and test set, with a ratio of 70–30%, which is a compromise between the subdivision in two thirds–one third [27] and the Pareto principle (also known as 80–20 rule). This ratio is a common choice in machine learning practice, although other train–test ratios are also available (e.g., 60–40%, 80–20% and 90–10%) [46]. However, the optimal train–test split ratio can be generally determined with a trial-and-error procedure [37]. In this study, the performance of the surrogate model with the selected train–test ratio has been compared with the following ones: 60–40%, 80–20% and 90–10%. For all tested split ratios, the performance in both training and test shows negligible discrepancy, meaning that the split ratio has a little influence on the predictive performance of the surrogate model. Cross-validation (CV) is applied to prevent overfitting: the training set is split into k groups or folds and the model is trained on \(k-1\) folds and validated on the remaining one. After cross-validation, a mean model performance score over all folds is provided. This process can be combined with the model selection phase, when the hyperparameters that give the best performance in terms of a prediction score are selected. For the SVR, the optimization of the hyperparameters is performed with two techniques, namely the randomized search and the Bayes search, which are part of the scikit-learn [39] and the scikit-optimize libraries [14], respectively. In the randomized search, a set of values is randomly sampled from a statistical distribution for each hyperparameter, i.e., C and \(\gamma\). Since these hyperparameters take continuous values, a log-uniform distribution is chosen to generate random samples of the parameter space. Finally, the algorithm searches the best values for the hyperparameters. Similarly, the Bayes search samples possible hyperparameters values from a given distribution. Based on Bayes’ theorem, this algorithm uses the acquired information to find the values of the hyperparameters that are likely to increase the model’s performance.

2.5 Stochastic parameters

In this study, the effective cohesion (\(c'\)), the effective friction angle (\(\varphi '\)), the saturated hydraulic conductivity (\(k_{\rm s}\)) and the curve fitting parameters of the SWCC (a and n) are modeled as random variables. The mean values (\(\mu\)), the coefficients of variation (COV) and correlation (\(\rho\)) are shown in Table 1. The random variables have a lognormal distribution to avoid unrealistic negative values. A lower bound of 1.05 is imposed to the parameter n to fulfill the condition \(0 \le S_{\rm e} \le 1\) [22] and an upper bound of \(45^{\circ }\) is imposed to the parameter \(\varphi '\) to discard values of the effective friction angle that deviate from the geotechnical practice. The samples from a truncated lognormal distribution are generated with the R package EnvStats [33]. A negative correlation is imposed between the effective cohesion and the effective friction angle as well as between the parameters a and n of the SWCC. Phoon et al. [41] found out that among several classical distributions (e.g., normal, gamma, exponential) the lognormal distribution is the most suitable to describe the variability of the curve fitting parameters a and n of the SWCC. Typical values of the COV of the shear strength and hydraulic parameters are found in [22, 40]. Phoon et al. [41] investigated the correlation between a and n for different types of soil and found out that these two parameters are negatively correlated with each other, with a range of \(\rho _{a,n}\) from − 0.5 to − 0.25%.

Table 1 Input soil parameters of the slope and the foundation

The sample size of the shear strength and hydraulic parameters used for training the surrogate model is 2000 samples. Running 2000 simulations with GeoStudio on a desktop computer with a processor speed of 2.40 GHz and a RAM memory of 8GB took approximately 18 hours. Figure 16 in Appendix shows the distribution of the samples and the correlation between the parameters \(c'\) and \(\varphi '\) and between a and n. An overview of the implementation procedure is depicted in Fig. 3.

Fig. 3
figure 3

Implementation procedure for the estimation of \(p_{\rm f}\) with surrogate model

3 Results

The seepage analysis and the slope stability analysis are conducted for each sample combination (i.e., 2000 times) and for each time step of the transient seepage analysis. The drawdown rate is \(1.8\ \mathrm{m}\ \mathrm{d}^{-1}\): the reservoir level reaches 10 m elevation after 5 days. In Fig. 4, the distribution of the PWP is shown before and after drawdown. The PWP and the effective degree of saturation are the inputs of the slope stability analysis and the FOS is calculated with the Morgenstern–Price method. Table 2 illustrates the FOS calculated with the LEM at each time step. These results are obtained by setting the mean values of the shear strength and hydraulic parameters of Table 1.

Fig. 4
figure 4

Distribution of the PWP in the slope (the dashed red and blue lines represent the critical slip surface and the water table, respectively)

Table 2 FOS for each time step calculated with the mean values of the parameters

A first estimation of \(p_{\rm f}\) can be achieved according to Eq. (6). Figure 17 in Appendix shows the probability of failure with increasing number of simulations at each time step. However, the number of samples provided may not be sufficient to accurately estimate \(p_{\rm f}\), especially for very low values of \(p_{\rm f}\). The surrogate model is tested on a larger sample to overcome this limitation. A direct Monte Carlo simulation generates \(10^{5}\) combinations of input parameters. Instead of calculating the FOS with GeoStudio for each combination, the surrogate model is employed to predict it. The input parameters (also called “features” in machine learning) are the shear strength and the hydraulic parameters. FOS is the output parameter, which is calculated at each time step: this means that the surrogate model is re-trained at each time step of the simulation. Finally, a more accurate estimation of \(p_{\rm f}\) is obtained from Eq. (6).

Two methods from ML are tested, namely SVR and XGBoost. For the former algorithm, two different optimization techniques for hyperparameter tuning are applied and compared: randomized search and Bayes search as described in Sect. 2.4. A fivefold cross-validation is applied both to SVR and XGBoost. Table 3 shows the error metrics for the test set in terms of coefficient of determination (\(\hbox {R}^{{2}}\)), Root mean squared error (RMSE) and Mean Absolute Error (MAE). SVR reaches high accuracy with both optimization techniques. However, the XGBoost method has superior performance, with a test \(\hbox {R}^{{2}}\) higher than 98.7%, and it is also computationally more efficient. Therefore, the XGBoost algorithm is selected for the estimation of \(p_{\rm f}\) and for a sensitivity analysis of the stochastic parameters. Figure 5 compares the FOS calculated with the Morgenstern–Price method, and the values obtained with the XGBoost model for the test set (i.e., 30% of the 2000 samples) of the time step no. 0. The predicted values are in good agreement with those calculated by the GeoStudio software, showing that the XGBoost model is capable of predicting the factor of safety with high accuracy.

Fig. 5
figure 5

Validation of the XGBoost model on the test set of time step no. 2. The dashed line represents a \(45^{\circ }\) line

Table 3 Error metrics of the surrogate models for the test set

The XGBoost model is used to predict FOS in 4 different scenarios of drawdown rate v: \({0.5}\ \mathrm{m}\ \mathrm{d}^{-1}\), \(1\ \mathrm{m}\ \mathrm{d}^{-1}\), \(1.8\ \mathrm{m}\ \mathrm{d}^{-1}\) (as set in Sect. 3) and \(3\ \mathrm{m}\ \mathrm{d}^{-1}\). The water level of the reservoir linearly decreases with time, until it reaches 10 m elevation after 18 days in scenario 1, after 9 days in scenario 2, after 5 days in scenario 3 and after 3 days in scenario 4 (Fig. 6). For each scenario, the XGBoost model is trained on a different dataset: the input parameters (\(c', \varphi ', k_{\rm s}, a, n\)) are sampled according to the statistics in Table 1. The output of the model (i.e., FOS) is calculated at each time step with GeoStudio. It takes approximately 30 s to train the XGBoost model for each scenario. Figure 6 shows FOS obtained with the mean values of the soil parameters in Table 1 and \(p_{\rm f}\), which is estimated with the predictions of the XGBoost model (through Eq. 6).

Fig. 6
figure 6

Variation of the reservoir water level, FOS and \(p_{\rm f}\) with time for different drawdown rates

As shown in Fig. 18 in Appendix, \(p_{\rm f}\) converges to a stable value after approximately 60,000 simulations except the time steps no. 0, 1 and 2, where \(p_{\rm f}\) is less or approximately equal to 1%. In these last two cases, \(p_{\rm f}\) does not converge after 100,000 simulations, indicating that a higher number of simulations would be necessary for a more accurate estimation. The uncertainty related to the estimated \(p_{\rm f}\) can be obtained as follows: \({\rm COV}_{p_{\rm f}} = \sqrt{\frac{1-p_{\rm f}}{n\,p_{\rm f}}}\) [2], where \(n = 10^5\). For time step 1 (after 12 h from rapid drawdown), this value is equal to 31.62%, which is slightly above the target \({\rm COV}_{p_{\rm f}}\) of 30% [19, 40] indicated in the literature. Therefore, the potential inaccuracy of the results at time step 1 should be treated with caution. At time step 2 (after 24 h), \({\rm COV}_{p_{\rm f}}\) is equal to 2.46%, much lower than the suggested threshold and, consequently, the estimated value of \(p_{\rm f}\) can be considered acceptable.

4 Discussion

A minimum factor of safety of 1.5 should be satisfied in steady-state conditions [16], but a lower FOS of 1.1–1.2 can be assumed acceptable under rapid drawdown [28, 45]. As expected, a higher drawdown velocity implicates a lower critical FOS and the variation of \(p_{\rm f}\) with time mirrors this behavior. Siacara et al. [45] verified that the critical time in terms of minimum FOS and maximum \(p_{\rm f}\) do not necessarily coincide. In this study, this effect may not be visible due to the frequency of the calculation steps in the seepage and slope stability analysis. However, according to the mentioned study, the critical FOS arises before the critical \(p_{\rm f}\), with 24 hours of delay for a drawdown rate of \(2\ \mathrm{m}\ \mathrm{d}^{-1}\) and 3 days for \(0.5\ \mathrm{m}\ \mathrm{d}^{-1}\). Following this reasoning, the peak in \(p_{\rm f}\) (Fig. 6) may occur slightly later in time, which could be verified by increasing the frequency of the calculations steps. As shown in Fig. 6, the peak of \(p_{\rm f}\) occurs when the reservoir level is stable at 10 m elevation or at the previous time step. In the first (\(v = 0.5 \ \mathrm{m}\ \mathrm{d}^{-1}\)) and second scenario (\(v = 1\ \mathrm{m}\ \mathrm{d}^{-1}\)), this corresponds to 15 days with \(p_{\rm f} = 22.81\%\) and to 8 days with \(p_{\rm f} = 38.04\%\), respectively. In the third scenario (whose results are also shown in Figs. 17 and 18), the peak of \(p_{\rm f}\) takes place exactly when the minimum water level is reached, that is, after 5 days with \(p_{\rm f} = 48.72\%\). The last scenario has the highest drawdown rate (\(v = {3}\ \mathrm{m}\ \mathrm{d}^{-1}\)): the maximum \(p_{\rm f}\) occurs after 2 days with \(p_{\rm f} = 72.42\%\), while the water level is constant at 10 m starting from 3 days after drawdown.

The feature importance is calculated for the test set using XGBoost. This parameter is defined as the variation in a model performance score (in this study \(\hbox {R}^{{2}}\)) after random permutation of a single feature. If the model performance score drops significantly after the random shuffle of a particular feature, the model is thus greatly dependent on this one. Figure 7 shows the feature importance of the effective friction angle \(\varphi '\), the effective cohesion \(c'\) and the curve fitting parameters a and n. The shear strength parameters exhibit the highest feature importance, with values that range from 76 to 92% for \(\varphi '\) and from 5 to 23% for \(c'\). An interesting aspect that emerges from Fig. 7 is the variation in the feature importance with time. For each time step, the surrogate model is trained with the same features (i.e., the samples of the random variables) but with different data for the label (i.e., FOS). Thus, the feature importance will change with each time step and consequently with the reservoir level. In general, the feature importance of \(\varphi '\) decreases for the duration of the drawdown, and then, it increases as the reservoir level is constant. A visible drop occurs earlier in time with increasing drawdown rate. The reversed trend is observed for \(c'\). As the water level decreases, the feature importance of \(c'\) increases until a peak is reached. Similar to the previous case, the peak is found earlier in time as the drawdown rate increases. The feature importance of the curve fitting parameters of the SWCC is much lower, below 2% for a and below 0.1% for n. A general increase in the feature importance of a and n can be observed after rapid drawdown, when the reservoir level is stable at 10 m. The parameter a increases asymptotically with time, while n has a peak approximately at the end of the drawdown, whose magnitude increases with higher drawdown rate. The feature importance of \(k_{\rm s}\) is omitted because lower than 0.5%. The observed results are interpreted as follows. With reference to Eq. (5), the shear strength can be divided into three components: cohesive, frictional and suction strength. As the reservoir level drops, the water load on the slope surface decreases together with the frictional strength along the slip surface. When the reservoir level becomes stable at 10 m, the pore water pressure in the slope dissipates and the frictional strength rises, reflecting the variation in the feature importance of the \(\varphi '\). Regarding the curve fitting parameters a and n, the suction strength slowly increases with time, thus resembling the pattern of their feature importance. However, it is important to point out that the correlation between the random variables can influence the results of the feature importance analysis. When the instances of a feature are permuted, the relationship between features and label (or input and output parameters) can be still obtained from the correlated feature [39]. This could explain the variation in the feature importance of \(c'\) that mirrors the one of \(\varphi '\), although the cohesive strength is constant with time. Similar outcomes in terms of relative importance between features are presented by Wang et al. [51]. The authors evaluate the feature importance of the shear strength and hydraulic parameters of a surrogate model based on MARS. This model is applied in the probabilistic analysis of an earth dam slope under transient seepage. Although the absolute values of the feature importance are not directly comparable with those of this study, a general trend in terms of relative importance of the individual features can be identified. The results of Wang et al. [51] show that the shear strength parameters have the highest importance, which is approximately ten times higher than that of the hydraulic parameters (a, n and \(k_{\rm s}\)).

Fig. 7
figure 7

Feature importance of the input parameters: a \(\varphi '\); b \(c'\), c a; d n

4.1 Sensitivity analysis

A sensitivity analysis is conducted to study the effect of the uncertainty of each soil parameter regarded in this study as stochastic (i.d. \(c'\), \(\varphi '\), \(k_{\rm s}\), a and n) and their correlation (\(\rho _{c',\varphi '}\) and \(\rho _{a,n}\)) on \(p_{\rm f}\). The considered ranges of COV (Table 4) are similar to that reported by Jiang et al. [22]. As the COV of one parameter or the coefficient of correlation is modified, the other parameters are untouched and take the same values as shown in Table 1. A new dataset is generated with a direct Monte Carlo simulation (\(10^{5}\) samples) according to the variation in the COV or of the coefficient of correlation that is of interest. For each scenario and each time step of the simulation, an XGBoost model previously trained (Sect. 3) is saved and tested on the new sample. This operation takes approximately 65 s, with a total time of 1 h required to estimate \(p_{\rm f}\) for all the cases included in the sensitivity analysis.

Table 4 Range of the coefficients of variation and correlation for the sensitivity analysis

4.2 Effect of COV of the shear strength parameters on the probability of failure

The effect of the uncertainty of \(c'\) and \(\varphi '\) is investigated for all scenarios. Figure 8 shows the variation of \(p_{\rm f}\) with time (left side) and with increasing \(\hbox {COV}_{\varphi '}\) (right side) for the different scenarios presented above. The general trend is of increasing \(p_{\rm f}\) with increasing \(\hbox {COV}_{\varphi '}\), which was also found by Jiang et al. [22], Wang et al. [51]. The maximum gain in \(p_{\rm f}\) is found in the scenarios 3 and 4 (Fig. 8c, d) at the time steps 8 and 5 days, respectively, with an increase of 23.72% and 24.14% due to \(\hbox {COV}_{\varphi '}\) varying from 0.05 to 0.20. In scenario 2, the highest increment of \(p_{\rm f}\) (Fig. 8b) is slightly above 22% at time steps 5 and 8 days.

An interesting aspect to look at is that, when the absolute value of \(p_{\rm f}\) is at its peak (time steps 15, 8, 5 and 2 days in scenarios 1, 2, 3 and 4), the increase of \(p_{\rm f}\) reduces between scenarios with increasing drawdown rate. The extreme case is observed in scenario 4 (Fig. 8d), where \(p_{\rm f}\) reduces with increasing \(\hbox {COV}_{\varphi '}\) at time steps 2 and 3 days. The reason for this is explained in Fig. 9, where the safety and failure regions are shown for scenario 3 after 5 days (on the left side) and scenario 4 after 2 days (on the right side). These conditions represent the peaks of \(p_{\rm f}\) in these two scenarios. The red and the green points represent slope failure and safety, respectively. In scenario 3, \(p_{\rm f}\) is lower than 50% and the mean value of FOS, which is the center of gravity of the data, is in the safety region (left side of Fig. 9). When \(\hbox {COV}_{\varphi '}\) increases, the datapoints spread in the direction of \(\varphi '\) and more datapoints belong to the failure region. In scenario 4 (right side of Fig. 9), the mean of FOS is lower than 1 and the center of gravity of the data lies in the failure region. In this case, the reversed behavior is observed: as \(\hbox {COV}_{\varphi '}\) increases more datapoints will belong to the safety region. This aspect was also observed by Griffiths and Fenton [12], Javankhoshdel and Bathurst [18] showing that for slopes with FOS lower than 1, \(p_{\rm f}\) decreases with increasing COV. Generally speaking, a low COV or a low standard deviation imply that the data are more closely distributed to their mean values. When COV increases, the likelihood that a realization belongs to the opposite region of that of the mean value is higher.

Fig. 8
figure 8

Variation of \(p_{\rm f}\) with \(\hbox {COV}_{\varphi '}\)

Fig. 9
figure 9

Variation of \(p_{\rm f}\) with \(\hbox {COV}_{\varphi '}\): a scenario 3 (\(v = {1.8}\ \mathrm{m}\ \mathrm{d}^{-1}\)) after 5 days; b scenario 4 (\(v = {3}\ \mathrm{m}\ \mathrm{d}^{-1}\)) after 2 days

The effect of \(\hbox {COV}_{c'}\) on \(p_{\rm f}\) is less significant than of the effective friction angle. This is probably due to the type of soil that has been considered in this study (e.g., clayey sand), where the contribution of the cohesion to the shear strength is limited. The variation of \(p_{\rm f}\) with \(\hbox {COV}_{c'}\) is very small for all scenarios, always lower than 4%. In Fig. 10, an increasing trend can be identified predominantly at the time steps that correspond to a reduction in the water level. In scenario 1, this corresponds to the time steps from 0 to 11, in scenario 2 from 0 to 5 days, in scenario 3 from 0 to 3 days and in scenario 4 from 0 to 1 day. A positive correlation between \(p_{\rm f}\) and \(\hbox {COV}_{c'}\) was also found by Jiang et al. [22]. However, when the water level is constant at 10 m, \(p_{\rm f}\) slightly decreases. As already mentioned, the effect of \(\hbox {COV}_{c'}\) on \(p_{\rm f}\) is very limited if compared with that of \(\hbox {COV}_{\varphi '}\). Figure 11 shows that the increase of \(\hbox {COV}_{c'}\) produces an expansion of both safety and failure regions in the direction of the effective cohesion. For a \(\hbox {COV}_{c'} = 0.50\), the limit between the safety and failure region becomes a horizontal line after 10 kPa. This means that for an effective cohesion higher than 10 kPa the influence of \(c'\) on \(p_{\rm f}\) is very small or negligible.

Fig. 10
figure 10

Variation of \(p_{\rm f}\) with \(\hbox {COV}_{c'}\)

Fig. 11
figure 11

Variation of \(p_{\rm f}\) with \(\hbox {COV}_{c'}\): a \(\hbox {COV}_{c'} = 0.20\) and b \(\hbox {COV}_{c'} = 0.50\)

4.3 Effect of the correlation of the shear parameters

The influence of \(\rho _{c',\varphi '}\) on \(p_{\rm f}\) is also investigated. In all scenarios, \(p_{\rm f}\) clearly increases with increasing correlation (from negative to positive) (Fig. 12). The maximum increase of \(p_{\rm f}\) with increasing \(\rho _{c',\varphi '}\) is of approximately 10% in scenarios 1, 2 and 3. The trend previously observed is not generally valid for scenario 4 (Fig. 12d): when \(p_{\rm f}\) is higher than 50% \(p_{\rm f}\) decreases with increasing \(\rho _{c',\varphi '}\). The reduction in \(p_{\rm f}\) at time steps 2 and 3 days is equal to 6.33% and 3.69%, respectively.

Fig. 12
figure 12

Variation of \(p_{\rm f}\) with \(\rho _{c',\varphi '}\)

Figure 13 shows this trend in terms of failure and safety regions, which are represented by \(10^{5}\) data points generated with a direct Monte Carlo simulation. The left side illustrates the results of scenario 2 at the time step 8 days, while scenario 4, at the time step 2 days (when \(p_{\rm f}\) is higher than 50 %), is shown on the right. A correlation between the shear parameters determines a deformation of the shape of the \(c'-\varphi '\) region that includes all the realizations (green and red data points). Low and Tang [31] performed a probabilistic slope stability analysis with FORM and showed that two correlated normal variables in the original space form a tilted ellipsoid centered at the mean. A negative correlation implies more frequent combinations of low values of effective cohesion coupled with high values of effective friction angle and vice versa. When a positive correlation is considered, the region is tilted in the opposite direction, corresponding to more frequent combinations of high values of both effective cohesion and effective friction angle (or low values of both parameters). In scenarios 1, 2 and 3, when \(p_{\rm f}\) is always lower than 50%, an increase in \(\rho _{c',\varphi '}\) (from negative to positive) produces an increase in \(p_{\rm f}\). Conversely, in scenario 4 (time step 2 and 3 days), an increase in \(\rho _{c',\varphi '}\) contributes to a reduction in the failure region. These results generally agree with those obtained by Griffiths et al. [13], where the authors state that when \(p_{\rm f} < 50\%\) a positive correlation is conservative and when \(p_{\rm f} > 50\%\) the opposite is true.

Fig. 13
figure 13

Variation of \(p_{\rm f}\) with \(\rho _{c',\varphi '}\): a scenario 2 (\(v = 1\ \mathrm{m}\ \mathrm{d}^{-1}\)) after 8 days; b scenario 4 (\(v = 3\ \mathrm{m}\ \mathrm{d}^{-1}\)) after 2 days

4.4 Effect of COV of the hydraulic parameters on \(p_{\rm f}\)

The influence of the uncertainty of the hydraulic parameters \(k_{\rm s}\), a and n on \(p_{\rm f}\) is examined. The hydraulic conductivity has a negligible influence on \(p_{\rm f}\), with a variation of \(p_{\rm f}\) below 0.5% in all scenarios. Therefore, a positive or negative correlation between \(p_{\rm f}\) and \(\hbox {COV}_{k_{\rm s}}\) is not appreciable.

The parameter a influences \(p_{\rm f}\) with an interesting tendency. The probability of failure increases with increasing \(\hbox {COV}_{\rm a}\) (Fig. 14), but this variation becomes more significant when the minimum water level of 10 m is reached. The right side of Fig. 14 shows the variation of \(p_{\rm f}\) with \(\hbox {COV}_{\rm a}\). The maximum variation of \(p_{\rm f}\) observed in the 4 scenarios (ordered with increasing drawdown rate) is equal to 2.11%, 2.04%, 2.85% and 2.03%, respectively.

Fig. 14
figure 14

Variation of \(p_{\rm f}\) with time and \(\hbox {COV}_{\rm a}\)

The influence of the curve fitting parameter n on \(p_{\rm f}\) is very limited (Fig. 15). A slight increase of \(p_{\rm f}\) with increasing \(\hbox {COV}_n\) is found before the minimum level of the reservoir is reached. This moment corresponds to the time steps 11, 5, 3 and 2 days in scenarios 1, 2, 3 and 4, respectively. Afterwards, this tendency changes into a very small decrease of \(p_{\rm f}\) with \(\hbox {COV}_n\), below 0.6%.

Fig. 15
figure 15

Variation of \(p_{\rm f}\) with \(\hbox {COV}_{{n}}\)

4.5 Effect of the correlation of the curve fitting parameters a and n

The influence of the correlation between a and n is not significantly affecting \(p_{\rm f}\). For each scenario there is a slight tendency of \(p_{\rm f}\) to decrease with increasing \(\rho _{a,n}\) (from negative to positive), but the variation is lower than 1%. Evidence of this trend is provided by Wang et al. [51] and Jiang et al. [22].

5 Conclusions

This study proposes an intelligent surrogate model to estimate the probability of failure of reservoir slopes under rapid drawdown. This approach enables to perform reliability analysis of complex geotechnical problems with little computational effort. The LHS method is used to produce a sample of the soil parameters that constitutes the input features of the surrogate model. A deterministic approach that includes a transient unsaturated seepage and a slope stability analysis is applied for each sample combination to calculate FOS, which is the output predicted by the surrogate model. This dataset represents the training set of the surrogate model, which is then tested on a much larger sample, generated with a direct Monte Carlo simulation to accurately estimate the probability of failure. Two algorithms are tested and compared: SVR and XGBoost. XGBoost proves to be superior both in terms of computational efficiency and accuracy, with a testing performance in terms of \(\hbox {R}^{{2}}\) above 98% against 94% of SVR. This model is employed to estimate the probability of failure of a slope in four different scenarios with increasing drawdown rates.

A sensitivity analysis has provided insights into the influence of each soil parameter considered as stochastic (i.e., \(c', \varphi ', k_{\rm s}, a\) and n) on the probability of failure. The effective friction angle has, in terms of \(\hbox {COV}_{\varphi '}\), the highest impact on \(p_{\rm f}\), followed by \(\hbox {COV}_{c'}\), \(\hbox {COV}_{\rm a}\), \(\hbox {COV}_n\) and \(\hbox {COV}_{k_{\rm s}}\). The variation of \(p_{\rm f}\) due to the COV of the random variables depends on the drawdown rate as well as the reservoir level. For a \(\hbox {COV}_{\varphi '}\) growing from 0.05 to 0.20, \(p_{\rm f}\) achieves an increase between 21% and 24% depending on the drawdown rate. It is also found that an increase in the COV generally causes a rise of \(p_{\rm f}\) when this is lower than 50% (i.e., \({\rm FOS} > 1\)), but this effect is reversed when \(p_{\rm f}\) is higher than 50%. This is caused by the position of the mean value of the shear strength parameters relative to the failure and safety regions. Furthermore, a positive correlation between the shear strength parameters \(c'\) and \(\varphi '\) is conservative when \(p_{\rm f}\) is lower than 50%. A rise in \(p_{\rm f}\) between 8 and 10% is observed in all scenarios when \(\rho _{c',\varphi '}\) increases from − 0.5 to 0.5. A feature importance analysis has highlighted on which input parameters the surrogate model mostly depends, namely \(\varphi '\) and \(c'\), with a maximum relative importance of 92% and 23%, respectively, depending on the drawdown rate and time step of the analysis. These results are consistent with those of the sensitivity analysis suggesting that reducing the variability of the effective friction angle with extensive site investigations would also aid in reducing \(p_{\rm f}\).

This work presents some limitations. The limit equilibrium method is applied to solve the slope stability analysis, and thus, the stress–strain behavior of the soil is not considered. Another issue is the spatial variability of soils, whose effect on \(p_{\rm f}\) has not been investigated. However, this work represents a benchmark study of a time-dependent reliability analysis with intelligent surrogate models and a basis of future research on the effect of the spatial variability of soils. Finally, this study has shown that intelligent surrogate models can be successfully used to perform probabilistic slope stability analysis and are capable of identifying the soil parameters with the major impact on the probability of failure.