The study of the distribution of stresses around expanding (or contracting) cavities has many applications in geomechanics, from tunneling and underground exploration to in situ soil characterization and resource extraction. Even though the problem scales are quite different, a commonality between these applications is that they can be modelled considering a cylindrical cavity that is either pressurized (e.g., the pressuremeter test) or deformed (e.g., cone penetration and tunneling). Moreover, cavity expansion mechanisms are not only used for characterization and design by engineers, but also for burrowing purposes by natural organisms. For instance, roots [1], earthworms [3, 11], razor clams [45] and sandfish [22] all use expanding cavities for exploration/navigation toward paths of least resistance or maximum nutrient yield, anchoring during excavation and assessing the mechanical stability of tunnel networks. These similarities between engineered and biological systems have sparked a wave of scientific interest in bio-inspiration applied to geotechnics [23]. The recent developments in robotics, in addition to the rise of Machine Learning (ML) applications in underground exploration and soil characterization, have opened the possibility to explore the development of autonomous devices that can burrow and sense underground.

This study focuses on the back-calculation of soil parameters from the radial stress at the wall of circular cavity subject to displacement-controlled expansion, under geostatic stress. To the authors’ best knowledge, there is no closed-form solution available at shallow depth (at which, the state of stress may not be considered biaxial). We simulate cavity expansion for various soil properties and depths with a 2D plane-strain Finite Element model. The simulation input parameters, together with the calculated radial stress at the cavity wall during the expansion are used as inputs to ML algorithms to assess the sensitivity of the FEM model, back-calculate the far-field stress and estimate soil constitutive parameters. The main contributions made in this work are the following:

  • A program was written to automatically launch Finite Element Method (FEM) simulations of displacement-controlled cavity expansion under a wide range of stress conditions and soil properties, and extract the radial stress distribution at the cavity wall during expansion. The results of the FEM simulations constitute a database that can be used for input in a ML-based algorithm.

  • Methods are proposed to prepare, transform and use such stress distributions to back-calculate the far field stresses and the mechanical parameters of the material around the cavity (Mohr-Coulomb friction angle \(\phi \), Young’s modulus E).

  • The performance of eight ML techniques in estimating the soil’s friction angle is assessed, and then, the same ML models are tested to predict the soil’s Young’s modulus. It is shown that similar convolutional neural networks can accurately back calculate the friction angle and the Young’s modulus of the material.

  • The combination of FEM and ML is expected to advance the technology of autonomous devices that can burrow and sense underground, considering that such devices could self-anchor and/or sense through the displacement controlled expansion of an embedded cavity.

The paper is structured as follows. We first present literature reviews of cavity expansion (Sect. 1.1) and machine learning (Sect. 1.2) applications in geotechnics. Section 2 describes the general workflow and explains the numerical modeling approach. Section 3 provides an overview of the ML algorithms used in this study. Section 4 summarizes our results for the far-field stress, the Mohr-Coulomb friction angle and the Young’s modulus, and discusses the performance of the ML models tested. Section 5 concludes the manuscript summarizing the findings, limitations and proposing possible directions of future work.

Related work

Cavity expansion in geotechnics

Cavity expansion is an active field of research in geomechanics given its applications to in situ testing [18, 20], resource withdrawal and tunneling [2, 46]. For applications that deal with vertical cavities at relatively large depths, the state of stress can be reduced to plane strain (no change in strain along the body of the cavity) and in-plane isotropic stress conditions. Under such conditions, the expansion problem can be reduced to a one-dimensional (radial) problem since both the distribution of stresses and the deformation of the cavity are radially isotropic [48]. However, for horizontal (or inclined) cavities, or vertical cavities under particular tectonic or sedimentation conditions, the in-plane state of stress may not be isotropic (for instance, the state of stress may be biaxial or geostatic). Therefore, the displacement-controlled expansion of a circular cavity yields a non-isotropic stress distribution, and similarly, the pressure-controlled expansion of a circular cavity generates a non-isotropic displacement of the cavity wall— increasing significantly the complexity of the analytical solutions.

The analytical solutions that give the closest stress estimates in such non-isotropic plane stress conditions are those developed for tunneling applications and in situ testing [48], for instance to interpret the pressuremeter test (PMT) [47], the cone penetration test (CPT) [35, 36] and the dilatometer test (DMT) [53]. Analytical solutions were also established to predict rock or soil behavior around vertical extraction wells relevant to hydraulic fracturing and geothermal foundations, and around horizontal shafts for deep tunneling, micro-tunneling and horizontal directional drilling (HDD) [12].

Still, the solutions mentioned above are developed for infinite domains, e.g., the size of the cavity is relatively small (or deep) compared to the dimensions of the domain it is embedded in. Therefore, there is no effect of free surfaces, and the vertical and horizontal far field stresses can be considered constant along the cavity, which results in isotropic or biaxial stress conditions. When analyzing the problem of shallow cavities, these assumptions do not necessarily hold true, since the gradient of stresses around the cavity becomes significant and the geostatic stress field can no longer be modelled as a biaxial state. Furthermore, experimental studies on shallow cavities such as [16, 31] and limit analysis formulations such as [16], have shown that the failure mechanism of shallow horizontal cavities is dominated by the development of shear planes that follow a catenary/parabolic shape, resembling a passive trapdoor mechanism, see [7, 40, 42].

These considerations on the mechanical response of the cavity increase significantly the complexity of the mathematical problem, for which, to the authors’ best knowledge, there is no available closed-from analytical solution. For this reason, we use the FEM to capture the response of cavities during expansion. We then test ML algorithms in order to estimate the far-field stress and soil parameters from the response of the model. Section 1.2 presents an overview of other applications of ML in geotechnics.

Machine learning applications in geotechnics

The use of ML in geotechnics has seen exponential growth over the last decade [50]. ML has indeed proven to be a useful tool to provide estimations of soil properties and/or limit loads for geotechnical applications where no closed-form analytical solutions exists. ML algorithms do not replace analytical solutions; they are comparable to empirical equations and correlations. For a given application, there is usually no consensus on which ML algorithm or data preparation methodology works best, and therefore some authors have written reviews that compare the advantages/limitations of several approaches. For instance, Lary et al. [17] reviewed applications of ML in geo-sciences and remote sensing, Zhang et al. [51] summarized applications of ML in the constitutive modeling of soils, and Wang and Sun [50] reviewed applications targeted toward modeling of soil properties. Although the range of applications is wide, most studies can be grouped into three categories: i) Estimation of mechanical properties of a system for specific loading conditions and soil type, ii) Estimation of a set of design parameters (e.g., limit load, factor of safety) from the response of a soil to stimuli and iii) Generation and/or calibration of constitutive models. The third category, which goes beyond the scope of this paper, is concerned with the creation, validation and tuning of constitutive models, as shown in the reviews presented in [43, 44, 51].

Estimation of mechanical properties typically resorts to either numerical models or field/lab data as the training set. Material mechanical properties are set as the output so as to create models that can later predict soil parameters. Applications using lab/field data include the prediction of shear strength in cohesive soils using neural networks (NN) from experimental data [33], the estimation of Cam-clay parameters [26], and the estimation of the over consolidation ratio (OCR) from piezocone penetration tests [15]. Studies in which numerical models are used to generate training data include the combination of NN and gradient-descent to identify constitutive model parameters from self-boring pressuremeter tests [27], and the use of the output of FEM simulations of piezocone penetration as the training data of a NN in order to estimate soil parameters [28].

Estimation of design parameters usually aims to calculate a factor of safety or a limit load, using either field data or numerical results as training sets. Some of the studies that use field and lab data include the work of Sulewska [39], which discusses six different applications of NN to predict displacements (settlement, consolidation) and the limit load (bearing capacity) in different applications. Khatibi et al. [13] estimated the shear wave velocity in incomplete data sets achieving good agreement between predicted and measured downhole pressures, which helped assessing wellbore stability. Zhang et al. [49] used data from a tunneling project in Changsha city (China) to predict surface settlement. Samui and Sitharam [37] trained a model that can predict the liquefaction susceptibility of a site. Lu et al. [19] estimated the pullback force during horizontal directional drilling using a genetic algorithm combined with support vector machines (SVM). Kardani et al. [10] estimated the bearing capacity of piles, using field data as the training set. When numerical models are used to generate training data, the numerical model maps the inputs, i.e., the geometry and loading conditions of the problem, to the desired output, typically a factor of safety or a limit load. Numerical models may be computationally expensive and/or rely on advanced/licensed software, making them inconvenient for fast estimations. ML models trained on numerical simulations are fast, reliable alternative methods that allow quick estimations. For instance, Makasis et al. [21] generated a model to design thermal piles. He et al. [8] focused on a reliability analysis of spatially variable slopes, training a ML model on relatively few numerical simulations, and showing the ability of the model to produce results comparable to those of the complex numerical model.

Numerical model

The finite element models used for the present study are built using Abaqus software. The 2D, plane strain model approximates the cross section of a cylindrical cavity, which is assumed to be long in relation to its diameter so that plane strain conditions hold. Due to the symmetry of the problem with respect to the vertical axis, and in order to reduce computational cost, only half of the domain is modelled (see Fig. 1). The top boundary of the domain represents the free surface, symmetric boundary conditions are set on the left wall of the domain (cavity’s side), and fixed boundary conditions are set on the right and bottom boundaries, see Fig. 1a. The diameter of the cavity (\(D_c\)) is fixed for all the simulations at 1.0 m—the choice of this diameter corresponds to an intermediate value between soil probing instruments, usually in the range of tens of centimeters, and the range of infrastructure construction devices such as tunneling machines. The depth of the cavity (H) is measured from the surface to the center of the cavity and is varied between 5 and 50 times the cavity diameter (5 to 50 m). The height of the model (\(H_m\)) was set to \(H_m = H + 25 D_c\), while its width (\(W_m\)) was set to \(W_m = 1.5 H_m\)—these dimensions were chosen according to [54] to avoid boundary effects. Each model was created and meshed automatically using a Python routine that exploits the scripting capabilities of Abaqus. The model was split into two sections prior to meshing. A first region concentric to the cavity with a diameter of \(6D_c\) was assigned 4-node bi-linear elements arranged in a structured mesh with 36 elements around the cavity (see Fig. 1c. A second region covering the rest of the domain was assigned 3-node linear elements arranged in a non-structured mesh. The number of elements in the mesh ranged between 1, 415 and 4, 340 depending on the depth of the cavity (see Fig. 1b. The geometry of the models and an example of mesh are shown in Fig. 1.

Fig. 1
figure 1

a Diagram (not to scale) of the symmetric (Sym.) and fixed (Fix.) boundary conditions (BC), the free surface, and the direction of gravity (g) in the model. b Mesh of the simulation domain (to scale). c Close-up of region around the cavity

Model input parameters

We used the Mohr-Coulomb (MC) soil constitutive model because of its simplicity and flexibility. After creating the geometry of the model, soil properties were assigned to the elements of the mesh, including: soil density (d), Young’s modulus (E), Poisson’s ratio (\(\nu \)), Mohr-Coulomb (MC) friction angle (\(\phi \)) and MC dilation angle (\(\psi \)). In the following sections, we refer to the soil density (d) and unit weight (\(\gamma \)) of the material interchangeably, keeping in mind that these two parameters are linearly related by \(\gamma = g \cdot d\), where g is the gravity acceleration. All the elements of the mesh were assigned the same soil properties, the values of which were selected randomly from a uniform distribution spanning ranges suggested in [14, 40] for frictional soils (silty sands to gravels)—see Table 1. The distributions of the different variables were assumed independent, which allowed us to ignore correlations between soil parameters.

Table 1 Bounds for the model parameters. Each variable was sampled independently assuming a uniform distribution between the bounds

A fixed value of MC cohesion (c) of 5 kPa was used in every simulation in order to improve the convergence of the models. Although there is no risk of achieving a state of stress at a corner of the MC yield surface in plane strain, the simulations were run with the Drucker-Prager (DP) model, using a matching smooth yield surface (e.g., [4, 9, 25, 38, 52]). The Python scripts developed to sample constitutive parameters for the present study can thus be used as they are for future 3D simulations.

Simulation procedure

The computations were performed in two steps. First, the displacement of the cavity wall was fixed to zero and a geostatic stress step was used to create a linear stress gradient controlled by the unit weight of the material and increasing in the direction of gravity (see Fig. 1a, i.e., from the free surface (zero vertical stress) to the bottom of the simulation domain. The resulting horizontal stress was consistent with the conditions of the soil at rest. We verified that the lateral earth pressure coefficient was \(K_o = 1 - sin(\phi )\) from the simulation results after the geostatic step, which Abaqus performs without inducing deformations to the model. The density and MC friction angle of the material are two of the variables assigned randomly in the models, and therefore, our set of simulations covers a wide range of pre-expansion stress conditions at comparable cavity depths. The expansion of the cavity was simulated in a second step, displacing the nodes around the cavity radially away from the cavity center at a constant rate, keeping the cavity circular. The maximum radial expansion was set to a large value (\(10 D_c\)) in order to push the simulation to the maximum expansion of the cavity before it fails. A total of 1,500 simulations were performed using a machine with a Dual Intel Xeon Gold 6148 (2.4 GHz, 3.7 GHz turbo) processors, and each simulation was run in serial instances of Abaqus Standard (implicit). In average, each simulation took about 5 minutes to complete for a total of about 125 hours of computing time.

Since the objective of the study is to test whether the data captured at the wall of the cavity during expansion can be used to infer soil behavior, we used the radial stress at each orientation around the cavity and at every value of radial expansion/deformation (\(\sigma _{r}(r=D_c/2,\theta ,\epsilon _r)\)) as a data set that can be given as input to the ML algorithm to back-calculate soil properties. Examples of \(\sigma _{r}(r=D_c/2,\theta ,\epsilon _r)\) data sets obtained by FEM simulation are shown in Fig. 2.

Fig. 2
figure 2

Example radial stress distributions at the cavity wall for two example simulations at different depths: \(H=5m\) and \(H=50m\)

Machine learning algorithms

Table 2 Description of the 8 different ML algorithms, listed by increasing complexity and tested to infer \(\phi \) from the cavity response \(\sigma _r\) and other simulation parameters

In our study, we focus on predicting the Mohr-Coulomb (MC) friction angle (\(\phi \)) and the Young’s modulus (E). We assume that the far field stresses (\(\sigma _v, \sigma _h, K_o\)), cavity depth (H), and soil density (d) are known, as they can be inferred from the initial stress distribution (see Sect 4.1). Therefore, we used them together with the radial stress distribution during expansion (\(\sigma _r(r=D_c/2,\theta ,\epsilon _r)\)) as predictive variables of the tested models.

ML models with higher complexity tend to be more accurate, but less interpretable [34]. Besides accuracy, interpretability is also important in our study, as we are interested in inferring the relationship between cavity expansion and soil properties. Therefore, in order to find the simplest model that can capture the mechanism, we trained and evaluated 8 different machine learning approaches with increasing complexity (Table 2). In the first model, we used a mean score predictor as our naïve baseline model. This model simply uses the average of \(\phi \) in the training set (with n samples) to predict every sample j in the test set, as follows:

$$\begin{aligned} {\hat{\phi }}_{\text {test, j}} = \frac{1}{n} \sum _{i = 1}^{n} \phi _{\text {train, i}} \end{aligned}$$
(1)

This mean score naïve baseline model assesses how well the distribution of \(\phi \) in the training set alone can predict unseen test samples. In addition, we test two linear regression models with different input features. The first only uses H and d, while the second uses H, d as well as \(\sigma _r(r=D_c/2,\theta ,\epsilon _r)\). We also explored various types of neural networks. Compared to linear regression models, a two-layer fully connected neural network can learn nonlinear relationships from the input features. Lastly, we transformed the surfaces described by \(\sigma _r(r=D_c/2,\theta ,\epsilon _r)\) (see Fig. 2 for examples) as gray-scale images, represented as a matrix with pixel values in the range [0, 1], and tested three six-layer Convolutional Neural Networks (CNNs) with different featurizations. Our results show that CNNs trained with H, d and \(\sigma _r(r=D_c/2,\theta ,\epsilon _r)\) represented as images have the highest accuracy.

Data processing

We generated all our data through FEM simulations (See Sect. 2.2). For each simulation, H, d, \(\phi \) and E are scalar values; \(\sigma _r(r=D_c,\theta ,\epsilon _r)\) is represented as a matrix \(\in \) \({\mathbb {R}}^{n \times 37}\), where n is the number of radial expansion steps in the FEM simulation (discretized values of \(\epsilon _r\)), and the 37 columns refer to different orientations around the cavity with a spacing of \(5^{\circ }\), each one corresponding to a node around the cavity. We note \({\sigma _r}^{i,j}\) the stress (numerical scalar) at node j at step i. We randomly split 1,363 simulation results into a training set with 817 simulations, a validation set with 273 simulations, and a test set with 273 simulations (6:2:2 ratio). We used the validation set to choose the optimal hyperparameters for some models (e.g., the regularization power in the linear regression model); we also monitored the performance of CNNs on the validation set during training and interrupted the training process when the performance stopped to improve in 20 consecutive epochs—common workflow to avoid overfitting [5]. Lastly, we evaluated ML algorithm performance by comparing the predictions of the ML models with the actual values calculated by FEM in the test set. We used the same dataset split to train and evaluate all 8 ML models.

The distribution of H, d, and \(\phi \) are shown in Fig. 3. For all models, we standardized H and d into a range of [0, 1] by linearly transforming them based on their minimums and maximums in the training set. For the stress field matrices \(\sigma _r\), we explored different featurization methods. For CNNs, we treated \(\sigma _r\) as 2D gray-scale images where the stress field value at orientation j and expansion step i can be thought as a pixel value. CNNs are designed for image analysis—with convolution transformations, CNNs can learn the spatial relationships across pixels [29], which encode cavity locations and simulation steps in our framework. Our hypothesis is that the spatial relationship between cavity location and simulation step is useful for predicting soil properties. Since linear regression models and fully-connected neural networks cannot use 2D matrices as input features, we vectorized these 2D matrices of \(\sigma _r\) into 1D vectors by stacking matrix rows together (row-major flattening).

Fig. 3
figure 3

Distributions of cavity depth H, soil density d, and friction angle \(\phi \) in the training set with 817 samples

Linear models

We used a trivial mean-score predictor as a baseline model. The model computes the average \(\phi \) in the training set, and then directly uses it as prediction of all samples in the validation set and test set. Linear regression is a popular and interpretable approach to study the relationship between soil properties and cavity behavior [50]. We applied Lasso regularization [41] to all of our linear regressions, as it can shrink the coefficients of less predictive features to zero and improve model generalizability. The first regression model only uses two numerical values (cavity depth H and soil density d) as input features, while the second model adds vectorized stress fields \(\sigma _r\) as an extra feature (Table 2).

The regularization power \(\lambda \) is a hyper-parameter, where larger \(\lambda \) imposes stronger regularization. We tune \(\lambda \) on the validation set through grid-search using the Python package scikit-learn [32]; we choose the \(\lambda \) that gives the smallest Mean Absolute Error (MAE) in our final model. The MAE is defined according to:

$$\begin{aligned} \text {MAE} = \frac{1}{n} \sum _{i=1}^{n} |{\hat{\phi }}_i - \phi _i | \end{aligned}$$
(2)

where n is the size of the dataset, \({\hat{\phi }}\) is the predicted value of a parameter (e.g., friction angle) and \(\phi \) is the true parameter value, taken as input in the FEM. Models with smaller MAE are more predictive.

Fully-connected neural network

We developed two simple fully-connected neural networks (NN), with two hidden layers each, using the Python package PyTorch [30]. Multi-layer NNs can approximate any continuous functions [6]. Therefore, compared to linear models (Sect. 3.2), NNs are more powerful ML models that can learn nonlinear relationships between the input features and MC friction angle \(\phi \). Similar to our experimental design for linear models, one NN model only uses two input features: cavity depth H and soil density d, and the second NN uses H, d, and vectorized \(\sigma _r\) as input (Table 2). The hyper-parameters: hidden neuron numbers, learning rate, and batch size, are tuned on the validation set through grid-search. We chose the combination that yielded the smallest MAE score as our final model.

Convolutional neural network

We developed three CNN models that share a similar model architecture (Fig. 4) but have different featurizations with the Python package PyTorch [30]. In our CNN model, we have six convolutional layers where each layer has six \(3\times 3\) kernels. After every 2 consecutive convolutional layers, we insert one max-pooling layer to introduce regularization in the model. Finally, we have two fully-connected layers to nonlinearly transform the image representations into a numerical value. One CNN takes the stress field images with dimension \(256\times 256\) generated by resizing and standardizing \(\sigma _r\) matrices. The second CNN takes these \(256\times 256\) images as input, and then standardized H and d are concatenated in the first fully-connected layer (Fig. 4). The third CNN has the same architecture as the second CNN, but for each row (cavity orientation), we linearly interpolated the values at the 37 cavity nodes to have values at every degree. For each column (radial expansion), we linearly interpolated the values in the range 0 to \(1\%\) to 100 data points. In that way, we increase the size of our images while keeping a consistent mapping, i.e., each row and column numbers refer to the same orientation and radial expansion across simulations, unlike the first and second CNN where we resized the images without interpolation.

Fig. 4
figure 4

Model architecture of our convolutional neural network (CNN) with six layers. The model takes a \(256\times 256\) gray-scale image that encodes the stress field \(\sigma _r\) as input. Then, the pixel representations are transformed through 17 layers to predict the MC friction angle \(\phi \) or the Young’s modulus E. This model has six convolutional layers where each is followed by a ReLU activation layer—helping the model learn nonlinear relationships. To avoid overfitting, one max-pooling layer is inserted after every two convolutional layers. Two fully connected (FC) layers are inserted after the convolutional and max-pooling layers. For the second and third CNN models to predict \(\phi \), two scalar values (cavity depth H and soil density d) are concatenated into the output vector from the last max-pooling layer. For the model to predict E, we concatenate four scalar values (cavity depth H, soil density d, and stiffness coefficients \(M_w\), \(M_c\)) instead. The fused vector is used as the input of the last two fully-connected layers

Results and discussion

Our results are organized as follows. In Sect. 4.1, the pre-expansion radial stress distribution at the cavity wall is used to estimate the far-field stresses around the cavity and unit weight of the soil. In Sect. 4.2, the radial stress distribution during expansion (\(\sigma _r\)) is used to back-calculate the MC friction angle (\(\phi \)), while Sect. 4.3 uses a similar framework to back-calculate the Young’s modulus (E) of the material.

Estimation of far-field stresses

The pre-expansion radial stress distribution at the cavity wall (\({\sigma _{r}}(r=D_c/2,\theta ,\varepsilon _r = 0)\)) is used to calculate the far field stresses in each simulation. An initial inspection of the initial stress distributions showed that the radial stress could be parametrized as a sum of sinusoidal/cosine signals. In fact, this observation was previously found and exploited by [24] for cavity expansion problems in elasticity. Figure 5 shows an example of the pre-expansion radial stress distribution for two simulations with similar far-field vertical stress \(\sigma _v\) but different far-field horizontal stress \(\sigma _h\).

Fig. 5
figure 5

Pre-expansion radial stress distribution at the cavity wall for a depth \(H = 10m\) and a soil unit weight \(\gamma = 23.8 kN/m^3\). The two simulations were conducted with different lateral earth pressure coefficients \(K_o\). Orientation angle measured from the bottom of the cavity in the counter-clockwise direction, i.e., top of the cavity at \(\pi \). The symmetry index \(S^2\) is defined in Eq. 6

Due to the presence of geostatic stress conditions, the initial radial stress distribution has a period of \(2\pi \) (one revolution around the circular cavity). We used the following Fourier series to fit the initial stress distribution around the cavity:

$$\begin{aligned} {\sigma _{r}}^{\varepsilon _r = 0} = \sum _{j = 0}^{4} a_j \cdot cos(j\theta ) \end{aligned}$$
(3)

where \(\theta \) is the orientation angle measured from the bottom of the cavity. By choosing this convention we have \(\sigma _{r}(\theta ) = \sigma _{r}(-\theta )\). The order of the fit (\(j=4\)) is the smallest value that could accurately capture the absence of biaxial symmetry in the radial stress distribution. Every fit of the initial stress distribution has coefficients of determination \(R^2 > 0.94\). Next, from Eq. 3, and knowing the fit coefficients \(a_j\) for every model, estimations for the radial stress at the invert (\(\theta = 0\)), waist (\(\theta = \pi /2\)) and crown (\(\theta = \pi \)) of the cavity were found from the following relations:

$$ \begin{aligned} & \sigma _{I} = a_{0} + a_{1} + a_{2} + a_{3} + a_{4} \\ & \sigma _{W} = a_{0} - a_{2} + a_{4} \\ & \sigma _{C} = a_{0} - a_{1} + a_{2} - a_{3} + a_{4} \\ \end{aligned} $$
(4)

in which the subscripts B, W and C correspond to the invert (bottom), waist (side) and crown (top) of the cavity respectively.

The vertical (\(\sigma _v\)) far-field stress at the depth of the cavity center (H) can be estimated as the average between the radial stress at the crown and invert of the cavity. Similarly, the horizontal (\(\sigma _h\)) far-field stress at the depth H can be estimated as radial stress at the waist of the cavity. And from there, assuming that the depth of the cavity is known and that the stress state follows geostatic conditions, the unit weight of the material and the lateral earth pressure coefficient can be estimated too, according to the following equations:

$$\begin{aligned} \sigma _v = \frac{\sigma _I + \sigma _C}{2} = a_0 + a_2 + a_4 \\ \sigma _h =\sigma _W = a_0 - a_2 + a_4 \\ \gamma = \frac{\sigma _v}{H} = \frac{a_0 + a_2 + a_4}{H} \\ K_o = \frac{\sigma _h}{\sigma _v} = \frac{a_0 - a_2 + a_4}{a_0 + a_2 + a_4} \end{aligned} $$
(5)

The estimation of the far-field stresses (respectively, soil unit weight and lateral earth pressure coefficient) is assessed against the actual values set as input in the FEM simulations in Fig. 6 (respectively, Fig. 7).

Fig. 6
figure 6

Estimation of far field stresses at the depth of the cavity center from the coefficients of the sum of cosines fit

Fig. 7
figure 7

Estimation of unit weight (\(\gamma \)) and earth pressure coefficient (\(K_o\)) from the coefficients of the sum of cosines fit

Lastly, the gradient of stresses present under geostatic stress conditions causes a lack of symmetry of the radial response between the top and bottom of the cavity. In order to compare and quantify the symmetry (or lack thereof) we define the coefficient of symmetry \(S^2\) as follows:

$$\begin{aligned} S^2 = \frac{\sum _{\theta = 0}^{\pi } (\sigma _r(\theta )-\sigma _r(-\theta ))^2}{\sum _{\theta = -\pi }^{\pi } (\sigma _r(\theta )-\bar{\sigma _r})^2} \end{aligned}$$
(6)

where \(\sigma _r(\theta )\) is the radial stress at an orientation \(\theta \) around the cavity wall, and \(\bar{\sigma _r}\) is the mean radial stress across all the orientations between \([0, \pi ]\).

\(S^2\) is similar to the coefficient of determination \(R^2\) and measures the difference between the radial stress at the top (waist to crown) and the bottom (waist to invert) of the cavity, and compares it to the difference between the entire radial stress distribution at the cavity wall and its average value. Similar to \(R^2\), the coefficient \(S^2\) has a maximum value of 1 when there is perfect symmetry between both sides of the distribution. Figure 8 shows the distribution of the values of \(S^2\) as a function of the depth of the cavity H and the lateral stress coefficient \(K_o\) (variables which had the highest influence on the symmetry index).

Fig. 8
figure 8

Distribution of the coefficient of symmetry \(S^2\) (see Eq. 6) as a function of normalized cavity depth (\(H/D_c\)) and lateral earth pressure coefficient (\(K_o\))

Results from Fig. 8 show that the depth of the cavity has a significant effect on the symmetry of the response. This was expected, since the stress gradient across the height of the cavity is significant for shallow cavities, and reduces in importance as the depth of the cavity increases. In fact, models in which the depth was more than 15 times the cavity diameter (15m) had \(S^2 >= 0.95\), suggesting that at such depths, the assumption of biaxial stress conditions may be acceptable. However, as the depth of the cavities decreases (i.e., the cavities are more shallow) there is a large variability of the symmetry of the response, which is controlled by the value of the lateral earth pressure coefficient. Low values of \(K_o\) display a higher symmetry of the pre-expansion stress distribution due to a larger difference between the magnitude of the vertical and horizontal far-field stress.

Estimation of MC friction angle \(\phi \)

In order to back-calculate the MC friction angle \(\phi \) of the material, we used the eight ML models described in Table 2. As explained above, the models use known parameters: cavity depth H, soil density d, and the radial stress distribution at the cavity wall \(\sigma _r(r=D_c/2,\theta ,\epsilon _r)\)) in order to predict the MC friction angle \(\phi \). As the dependent variable \(\phi \) is a continuous value, we use the MAE to measure the performance of the different models (see Eq. 2).

The MAEs during training, validation (tuning of hyper-parameters) and testing are reported in Table 3. Results show that increasing the complexity of the models results in better performance. For instance, with the same input parameters, fully-connected neural networks have better performance than linear regression models, and CNNs outperforms fully-connected neural networks across different evaluation subsets. We see the largest performance improvement when including CNN models, which suggests that the nonlinear spatial relationships across cavity locations and simulation steps are critical to accurately predict the MC friction angle \(\phi \). The resulting test MAE of the CNN model is \(0.49^\circ \). This means that in average, the error in the friction angle estimation \({\hat{\phi }}\) is within 0.5 degrees from the true value \(\phi \).

Interestingly, using linear interpolation to modify the input stress field images does not improve the CNN performance. The interpolation modifies the grey scale images obtained from \(\sigma _r\), in such a way that every row number (orientation around the cavity) and every column number correspond to the same values across simulations. This fact suggests that the performance of the model is controlled by the ’shape’ of the radial stress distribution, not the magnitude of the expansion.

Table 3 Mean absolute error (MAE) obtained with eight ML algorithms with increasing complexity to test whether the soil’s MC friction angle \(\phi \) can be inferred from cavity characteristics and radial stress distribution \(\sigma _r\). The results highlight that CNN models can accurately predict \(\phi \) from the radial stress distribution \(\sigma _r\)

Estimation of young’s modulus E

In this section, we apply some of the models used in Sect. 4.2 to infer the value of the Young’s modulus (E) of the soil mass without further training, hence testing the generalizability of the framework.

We introduce two scalar parameters, \(M_w\) and \(M_c\), which correspond to the slope of the radial stress/strain curve between the initial radial stress and the stress at a radial deformation \(\varepsilon _r=0.1\%\), measured according to the following equation:

$$\begin{aligned} M_j = \frac{\sigma _{r}(\theta _j)^{\varepsilon _r=.1\%}-\sigma _{r}(\theta _j)^{\varepsilon _r=0}}{0.1\%} \end{aligned}$$
(7)

where the subscript j corresponds to the orientations at the waist (\(M_w\)) and crown (\(M_c\)). We applied the data processing pipeline (Sect. 3.1), model architectures of linear models (Sect. 3.2) and CNN (Sect. 3.4) to predict E, this time with the addition of \(M_w\) and \(M_c\), calculated for each one of the simulations, and fed to our CNN models together with H and d before the fully-connected layers (Fig. 4). \(M_w\) and \(M_c\) are intuitive measures of the stiffness of the material around the cavity and their inclusion significantly improved the performance of the prediction models. The Mean Absolute Percentage Error (MAPE) is calculated as:

$$\begin{aligned} \text {MAPE} = 100 \times \frac{1}{n} \sum _{i=1}^{n} |\frac{\hat{E_i} - E_i}{E_i} | \end{aligned}$$
(8)

The ML experiment results are summarized in Table 4, where we see a similar pattern of model performance as in the estimation of MC friction angle \(\phi \). It is worth noting that in this experiment, we use a slightly different metric to measure the performance of the predictor. The MAPE (Eq. 8) is a relative measurement of error between the predicted Young’s modulus \({\hat{E}}\) compared to its true value E. Therefore, the performance scores in Table 4 have a different scale than the scores in Table 3. The performance of the models resonates with our findings in Table 3: (1) The CNN model yields the best prediction performance; (2) The inclusion of the stress field distribution \(\sigma _r\) improves the accuracy of the predictions of E; (3) The MAPE of the estimation is under \(2\%\).

Table 4 Mean absolute percentage error (MAPE) obtained with four machine learning models replicated from our soil friction angle \(\phi \) experiment to predict soil elasticity E. The results converge with our findings in the previous experiment, suggesting that our inference framework is generalizable to other soil properties

Influence of noise in ML models

The ML models presented in Sects. 4.2 and 4.3 are tested and trained with ‘clean’ data obtained from FEM simulations. However, to test the applicability of this method to data acquired from field measurement we test the influence of noise in the accuracy of the CNN models (which yield the highest accuracy).

Gaussian noise, i.e., random values sampled from a normal distribution N(0, std) with zero mean and standard deviation std, are added to each pixel of the stress field images \(\sigma _r\). Ten different levels of noise are tested by increasing std from 0.1 to 1 (\(std = 0.1, 0.2, ..., 1.0\)). Resulting pixel values outside the normalized range [0, 1] are assigned values of 0 and 1 respectively.

Resulting datasets \((\sigma _r)_{std} = \sigma _r + N(0,std)\) are used to train, validate and test the CNN models that predict \(\phi \) and E. The accuracy of each model, in terms of the MAPE of the estimation of E and the MAE of the estimation of \(\phi \), for each value of std, are summarized in Fig. 9.

Fig. 9
figure 9

Evolution of mean errors as a function of noise magnitude standard deviation (std). The left axis shows the MAE of the estimation of \(\phi \), and the right axis shows the MAPE of the estimation of E, as a function of std

The estimation of \(\phi \) is more sensitive to noise, increasing monotonically from a MAE of \(0.48^{\circ }\) (with no noise added), to \(2.7^{\circ }\) with \(std = 1.0\). The MAPE of the estimation of E increases from \(1.84\%\) (with no noise) to about \(2.3\%\) for \(std \ge 0.2\), after which the estimation appears to plateau.

The perceived lower sensitivity to noise of the estimation of E is partially attributed to the fact that the MAPE is a percentage metric, as opposed to the MAE used for \(\phi \). In addition, the high performance of the model suggests that the extra features \(M_w\) and \(M_c\) improve the robustness of the model, offsetting the influence of std.

Conclusions

This manuscript presents a novel framework that couples FEM and ML to back-calculate the far-field stresses and soil properties from the radial stress field (\(\sigma _r\)) at the wall of a circular cavity during displacement-controlled expansion under non-negligible geostatic stress gradients. We exploited the resemblance of the initial radial stress distribution around the cavity and Fourier-like functions to propose a simple, yet accurate way to back-calculate far-field stresses around the cavity. Moreover, such simple representation of the initial stress distribution (controlled by 4 scalar parameters), can also be used to assess the existence of geostatic or biaxial stress conditions, based on the symmetry (or lack thereof) of the stress distributions between the top and bottom parts of the cavity. We then evaluated eight different ML models of increasing complexity in order to find the simplest, most interpretable method that can accurately back-calculate the MC friction angle of the material around the cavity (\(\phi \)). Results show that using image-like data that encodes the radial stress distribution as a function of radial deformation and orientation angle significantly improves the accuracy of the models. Interestingly, we observed that the predictive power of the encoded images to predict \(\phi \) is controlled by the ’shape’ of \(\sigma _r\) rather than by its magnitude. Lastly, we tested the flexibility of the prediction framework used to back-calculate \(\phi \), by re-using it to predict the Young’s modulus of the material (E). Having a consistent and flexible framework that can consistently back-calculate different soil properties significantly reduces the time that must be invested in data preparation and model selection. The best-performing model has a mean absolute percentage error (MAPE) under \(2\%\), using a convolutional neural network that uses \(\sigma _r\) and intuitive cavity stiffness parameters (\(M_w\), \(M_c\)). Although there is no practical way to interpret the mechanisms occurring inside the neural network, we hypothesise that the encoded images from \(\sigma _r\) and the parameters \(M_w\), \(M_c\) provide ’shape’ and magnitude information respectively, which is then combined to estimate the Young’s modulus of the material. The CNN models proposed in this study were found to have little sensitivity to Gaussian noise: the MAE of \(\phi \) (respectively, the MAPE of E) increases from 0.5\(^o\) (respectively, from 1.84%) without noise to 2.75\(^o\) (respectively, to 2.3%) with noise of unit standard deviation. This relative insensitivity to noise is indicative of the robustness of the proposed CNNs.

The merits of the current study are not limited to the particular problem of cavity expansion nor to the use of machine learning algorithms in geotechnical problems. The current study can indeed be used as a guide to generate, prepare and transform mechanical data (i.e., radial stress distribution \(\sigma _r\)) for use as training data in ML algorithms. To that end, we have used different fitting techniques and prediction models, all informed by a mechanical understanding of the problem at stake, and intuitive choices that may improve the performance of prediction algorithms. Future work spanning from this study includes testing the effect of less ideal conditions encountered in physical experiments and validating the methods presented in this study against actual experimental and/or field data. Another possible extension of this work is the exploration of other applications where no analytical solutions exist yet. Still, as with any other study that applies data-science-based methods to mechanical problems, careful guiding, interpretation and validation of the results is fundamental.